103 research outputs found

    Genealogy Reconstruction: Methods and applications in cancer and wild populations

    Get PDF
    Genealogy reconstruction is widely used in biology when relationships among entities are studied. Phylogenies, or evolutionary trees, show the differences between species. They are of profound importance because they help to obtain better understandings of evolutionary processes. Pedigrees, or family trees, on the other hand visualize the relatedness between individuals in a population. The reconstruction of pedigrees and the inference of parentage in general is now a cornerstone in molecular ecology. Applications include the direct infer- ence of gene flow, estimation of the effective population size and parameters describing the population’s mating behaviour such as rates of inbreeding. In the first part of this thesis, we construct genealogies of various types of cancer. Histopatho- logical classification of human tumors relies in part on the degree of differentiation of the tumor sample. To date, there is no objective systematic method to categorize tumor subtypes by maturation. We introduce a novel algorithm to rank tumor subtypes according to the dis- similarity of their gene expression from that of stem cells and fully differentiated tissue, and thereby construct a phylogenetic tree of cancer. We validate our methodology with expression data of leukemia and liposarcoma subtypes and then apply it to a broader group of sarcomas and of breast cancer subtypes. This ranking of tumor subtypes resulting from the application of our methodology allows the identification of genes correlated with differentiation and may help to identify novel therapeutic targets. Our algorithm represents the first phylogeny-based tool to analyze the differentiation status of human tumors. In contrast to asexually reproducing cancer cell populations, pedigrees of sexually reproduc- ing populations cannot be represented by phylogenetic trees. Pedigrees are directed acyclic graphs (DAGs) and therefore resemble more phylogenetic networks where reticulate events are indicated by vertices with two incoming arcs. We present a software package for pedigree reconstruction in natural populations using co-dominant genomic markers such as microsatel- lites and single nucleotide polymorphism (SNPs) in the second part of the thesis. If available, the algorithm makes use of prior information such as known relationships (sub-pedigrees) or the age and sex of individuals. Statistical confidence is estimated by Markov chain Monte Carlo (MCMC) sampling. The accuracy of the algorithm is demonstrated for simulated data as well as an empirical data set with known pedigree. The parentage inference is robust even in the presence of genotyping errors. We further demonstrate the accuracy of the algorithm on simulated clonal populations. We show that the joint estimation of parameters of inter- est such as the rate of self-fertilization or clonality is possible with high accuracy even with marker panels of moderate power. Classical methods can only assign a very limited number of statistically significant parentages in this case and would therefore fail. The method is implemented in a fast and easy to use open source software that scales to large datasets with many thousand individuals.:Abstract v Acknowledgments vii 1 Introduction 1 2 Cancer Phylogenies 7 2.1 Introduction..................................... 7 2.2 Background..................................... 9 2.2.1 PhylogeneticTrees............................. 9 2.2.2 Microarrays................................. 10 2.3 Methods....................................... 11 2.3.1 Datasetcompilation ............................ 11 2.3.2 Statistical Methods and Analysis..................... 13 2.3.3 Comparison of our methodology to other methods . . . . . . . . . . . 15 2.4 Results........................................ 16 2.4.1 Phylogenetic tree reconstruction method. . . . . . . . . . . . . . . . . 16 2.4.2 Comparison of tree reconstruction methods to other algorithms . . . . 28 2.4.3 Systematic analysis of methods and parameters . . . . . . . . . . . . . 30 2.5 Discussion...................................... 32 3 Wild Pedigrees 35 3.1 Introduction..................................... 35 3.2 The molecular ecologist’s tools of the trade ................... 36 3.2.1 3.2.2 3.2.3 3.2.1 Sibship inference and parental reconstruction . . . . . . . . . . . . . . 37 3.2.2 Parentage and paternity inference .................... 39 3.2.3 Multigenerational pedigree reconstruction . . . . . . . . . . . . . . . . 40 3.3 Background..................................... 40 3.3.1 Pedigrees .................................. 40 3.3.2 Genotypes.................................. 41 3.3.3 Mendelian segregation probability .................... 41 3.3.4 LOD Scores................................. 43 3.3.5 Genotyping Errors ............................. 43 3.3.6 IBD coefficients............................... 45 3.3.7 Bayesian MCMC.............................. 46 3.4 Methods....................................... 47 3.4.1 Likelihood Model.............................. 47 3.4.2 Efficient Likelihood Calculation...................... 49 3.4.3 Maximum Likelihood Pedigree ...................... 51 3.4.4 Full siblings................................. 52 3.4.5 Algorithm.................................. 53 3.4.6 Missing Values ............................... 56 3.4.7 Allelefrequencies.............................. 58 3.4.8 Rates of Self-fertilization.......................... 60 3.4.9 Rates of Clonality ............................. 60 3.5 Results........................................ 61 3.5.1 Real Microsatellite Data.......................... 61 3.5.2 Simulated Human Population....................... 62 3.5.3 SimulatedClonalPlantPopulation.................... 64 3.6 Discussion...................................... 71 4 Conclusions 77 A FRANz 79 A.1 Availability ..................................... 79 A.2 Input files...................................... 79 A.2.1 Maininputfile ............................... 79 A.2.2 Knownrelationships ............................ 80 A.2.3 Allele frequencies.............................. 81 A.2.4 Sampling locations............................. 82 A.3 Output files..................................... 83 A.4 Web 2.0 Interface.................................. 86 List of Figures 87 List of Tables 88 List Abbreviations 90 Bibliography 92 Curriculum Vitae

    FRANz: reconstruction of wild multi-generation pedigrees

    Get PDF
    Summary: We present a software package for pedigree reconstruction in natural populations using co-dominant genomic markers such as microsatellites and single nucleotide polymorphisms (SNPs). If available, the algorithm makes use of prior information such as known relationships (sub-pedigrees) or the age and sex of individuals. Statistical confidence is estimated by Markov Chain Monte Carlo (MCMC) sampling. The accuracy of the algorithm is demonstrated for simulated data as well as an empirical dataset with known pedigree. The parentage inference is robust even in the presence of genotyping errors

    Cross-study validation for the assessment of prediction algorithms

    Get PDF
    Motivation: Numerous competing algorithms for prediction in high-dimensional settings have been developed in the statistical and machine-learning literature. Learning algorithms and the prediction models they generate are typically evaluated on the basis of cross-validation error estimates in a few exemplary datasets. However, in most applications, the ultimate goal of prediction modeling is to provide accurate predictions for independent samples obtained in different settings. Cross-validation within exemplary datasets may not adequately reflect performance in the broader application context. Methods: We develop and implement a systematic approach to ‘cross-study validation’, to replace or supplement conventional cross-validation when evaluating high-dimensional prediction models in independent datasets. We illustrate it via simulations and in a collection of eight estrogen-receptor positive breast cancer microarray gene-expression datasets, where the objective is predicting distant metastasis-free survival (DMFS). We computed the C-index for all pairwise combinations of training and validation datasets. We evaluate several alternatives for summarizing the pairwise validation statistics, and compare these to conventional cross-validation. Results: Our data-driven simulations and our application to survival prediction with eight breast cancer microarray datasets, suggest that standard cross-validation produces inflated discrimination accuracy for all algorithms considered, when compared to cross-study validation. Furthermore, the ranking of learning algorithms differs, suggesting that algorithms performing best in cross-validation may be suboptimal when evaluated through independent validation. Availability: The survHD: Survival in High Dimensions package (http://www.bitbucket.org/lwaldron/survhd) will be made available through Bioconductor. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online

    ESTs and EST-linked polymorphisms for genetic mapping and phylogenetic reconstruction in the guppy, Poecilia reticulata

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The guppy, <it>Poecilia reticulata</it>, is a well-known model organism for studying inheritance and variation of male ornamental traits as well as adaptation to different river habitats. However, genomic resources for studying this important model were not previously widely available.</p> <p>Results</p> <p>With the aim of generating molecular markers for genetic mapping of the guppy, cDNA libraries were constructed from embryos and different adult organs to generate expressed sequence tags (ESTs). About 18,000 ESTs were annotated according to BLASTN and BLASTX results and the sequence information from the 3' UTRs was exploited to generate PCR primers for re-sequencing of genomic DNA from different wild type strains. By comparison of EST-linked genomic sequences from at least four different ecotypes, about 1,700 polymorphisms were identified, representing about 400 distinct genes. Two interconnected MySQL databases were built to organize the ESTs and markers, respectively. A robust phylogeny of the guppy was reconstructed, based on 10 different nuclear genes.</p> <p>Conclusion</p> <p>Our EST and marker databases provide useful tools for genetic mapping and phylogenetic studies of the guppy.</p

    curatedOvarianData: clinically annotated data for the ovarian cancer transcriptome

    Get PDF
    This article introduces a manually curated data collection for gene expression meta-analysis of patients with ovarian cancer and software for reproducible preparation of similar databases. This resource provides uniformly prepared microarray data for 2970 patients from 23 studies with curated and documented clinical metadata. It allows users to efficiently identify studies and patient subgroups of interest for analysis and to perform meta-analysis immediately without the challenges posed by harmonizing heterogeneous microarray technologies, study designs, expression data processing methods and clinical data formats. We confirm that the recently proposed biomarker CXCL12 is associated with patient survival, independently of stage and optimal surgical debulking, which was possible only through meta-analysis owing to insufficient sample sizes of the individual studies. The database is implemented as the curatedOvarianData Bioconductor package for the R statistical computing language, providing a comprehensive and flexible resource for clinically oriented investigation of the ovarian cancer transcriptome. The package and pipeline for producing it are available from http://bcb.dfci.harvard.edu/ovariancancer. Database URL: http://bcb.dfci.harvard.edu/ovariancance

    Genomic profiling of NETs : A comprehensive analysis of the RADIANT trials

    Get PDF
    Neuroendocrine tumors (NETs) have historically been subcategorized according to histologic features and the site of anatomic origin. Here, we characterize the genomic alterations in patients enrolled in three phase 3 clinical trials of NET of different anatomic origins and assess the potential correlation with clinical outcomes. Whole-exome and targeted panel sequencing was used to characterize 225 NET samples collected in the RADIANT series of clinical trials. Genomic profiling of NET was analyzed along with nongenomic biomarker data on the tumor grade and circulating chromogranin A (CgA) and neuron-specific enolase (NSE) levels from these patients enrolled in clinical trials. Our results highlight recurrent large-scale chromosomal alterations as a common theme among NET. Although the specific pattern of chromosomal alterations differed between tumor subtypes, the evidence for generalized chromosomal instability (CIN) was observed across all primary sites of NET. In pancreatic NET, although the P value was not significant, higher CIN suggests a trend toward longer survival (HR, 0.55, P = 0.077), whereas in the gastrointestinal NET, lower CIN was associated with longer survival (HR, 0.44, P = 0.0006). Our multivariate analyses demonstrated that when combined with other clinical data among patients with progressive advanced NETs, chromosomal level alteration adds important prognostic information. Large-scale CIN is a common feature of NET, and specific patterns of chromosomal gain and loss appeared to have independent prognostic value in NET subtypes. However, whether CIN in general has clinical significance in NET requires validation in larger patient cohort and warrants further mechanistic studies

    Molecular analysis of a male breast cancer patient with prolonged stable disease under mTOR/PI3K inhibitors BEZ235/everolimus

    Get PDF
    The mTORC1 inhibitor everolimus (Afinitor/RAD001) has been approved for multiple cancer indications, including ER(+)/HER2(-) metastatic breast cancer. However, the combination of everolimus with the dual PI3K/mTOR inhibitor BEZ235 was shown to be more efficacious than either everolimus or BEZ235 alone in preclinical models. Herein, we describe a male breast cancer (MBC) patient who was diagnosed with hormone receptor-positive (HR(+))/HER2(-) stage IIIA invasive ductal carcinoma and sequentially treated with chemoradiotherapy and hormonal therapy. Upon the development of metastases, the patient began a 200 mg twice-daily BEZ235 and 2.5 mg weekly everolimus combination regimen. The patient sustained a prolonged stable disease of 18 mo while undergoing the therapy, before his tumor progressed again. Therefore, we sought to both better understand MBC and investigate the underlying molecular mechanisms of the patient's sensitivity and subsequent resistance to the BEZ235/everolimus combination therapy. Genomic and immunohistochemical analyses were performed on samples collected from the initial invasive ductal carcinoma pretreatment and a metastasis postprogression on the BEZ235/everolimus combination treatment. Both tumors were relatively quiet genomically with no overlap to recurrent MBC alterations in the literature. Markers of PI3K/mTOR pathway hyperactivation were not identified in the pretreatment sample, which complements previous reports of HR(+) female breast cancers being responsive to mTOR inhibition without this activation. The postprogression sample, however, demonstrated greater than fivefold increased estrogen receptor and pathogenesis-related protein expression, which could have constrained the PI3K/mTOR pathway inhibition by BEZ235/everolimus. Overall, these analyses have augmented the limited episteme on MBC genetics and treatment
    corecore