103 research outputs found
Genealogy Reconstruction: Methods and applications in cancer and wild populations
Genealogy reconstruction is widely used in biology when relationships among entities are studied. Phylogenies, or evolutionary trees, show the differences between species. They are of profound importance because they help to obtain better understandings of evolutionary processes. Pedigrees, or family trees, on the other hand visualize the relatedness between individuals in a population. The reconstruction of pedigrees and the inference of parentage in general is now a cornerstone in molecular ecology. Applications include the direct infer- ence of gene flow, estimation of the effective population size and parameters describing the population’s mating behaviour such as rates of inbreeding.
In the first part of this thesis, we construct genealogies of various types of cancer. Histopatho- logical classification of human tumors relies in part on the degree of differentiation of the tumor sample. To date, there is no objective systematic method to categorize tumor subtypes by maturation. We introduce a novel algorithm to rank tumor subtypes according to the dis- similarity of their gene expression from that of stem cells and fully differentiated tissue, and thereby construct a phylogenetic tree of cancer. We validate our methodology with expression data of leukemia and liposarcoma subtypes and then apply it to a broader group of sarcomas and of breast cancer subtypes. This ranking of tumor subtypes resulting from the application of our methodology allows the identification of genes correlated with differentiation and may help to identify novel therapeutic targets. Our algorithm represents the first phylogeny-based tool to analyze the differentiation status of human tumors.
In contrast to asexually reproducing cancer cell populations, pedigrees of sexually reproduc- ing populations cannot be represented by phylogenetic trees. Pedigrees are directed acyclic graphs (DAGs) and therefore resemble more phylogenetic networks where reticulate events are indicated by vertices with two incoming arcs. We present a software package for pedigree reconstruction in natural populations using co-dominant genomic markers such as microsatel- lites and single nucleotide polymorphism (SNPs) in the second part of the thesis. If available, the algorithm makes use of prior information such as known relationships (sub-pedigrees) or the age and sex of individuals. Statistical confidence is estimated by Markov chain Monte Carlo (MCMC) sampling. The accuracy of the algorithm is demonstrated for simulated data as well as an empirical data set with known pedigree. The parentage inference is robust even in the presence of genotyping errors. We further demonstrate the accuracy of the algorithm on simulated clonal populations. We show that the joint estimation of parameters of inter- est such as the rate of self-fertilization or clonality is possible with high accuracy even with marker panels of moderate power. Classical methods can only assign a very limited number of statistically significant parentages in this case and would therefore fail. The method is implemented in a fast and easy to use open source software that scales to large datasets with many thousand individuals.:Abstract v
Acknowledgments vii
1 Introduction 1
2 Cancer Phylogenies 7
2.1 Introduction..................................... 7
2.2 Background..................................... 9
2.2.1 PhylogeneticTrees............................. 9
2.2.2 Microarrays................................. 10
2.3 Methods....................................... 11
2.3.1 Datasetcompilation ............................ 11
2.3.2 Statistical Methods and Analysis..................... 13
2.3.3 Comparison of our methodology to other methods . . . . . . . . . . . 15
2.4 Results........................................ 16
2.4.1 Phylogenetic tree reconstruction method. . . . . . . . . . . . . . . . . 16
2.4.2 Comparison of tree reconstruction methods to other algorithms . . . . 28
2.4.3 Systematic analysis of methods and parameters . . . . . . . . . . . . . 30
2.5 Discussion...................................... 32
3 Wild Pedigrees 35
3.1 Introduction..................................... 35
3.2 The molecular ecologist’s tools of the trade ................... 36
3.2.1 3.2.2 3.2.3
3.2.1 Sibship inference and parental reconstruction . . . . . . . . . . . . . . 37
3.2.2 Parentage and paternity inference .................... 39
3.2.3 Multigenerational pedigree reconstruction . . . . . . . . . . . . . . . . 40
3.3 Background..................................... 40
3.3.1 Pedigrees .................................. 40
3.3.2 Genotypes.................................. 41
3.3.3 Mendelian segregation probability .................... 41
3.3.4 LOD Scores................................. 43
3.3.5 Genotyping Errors ............................. 43
3.3.6 IBD coefficients............................... 45
3.3.7 Bayesian MCMC.............................. 46
3.4 Methods....................................... 47
3.4.1 Likelihood Model.............................. 47
3.4.2 Efficient Likelihood Calculation...................... 49
3.4.3 Maximum Likelihood Pedigree ...................... 51
3.4.4 Full siblings................................. 52
3.4.5 Algorithm.................................. 53
3.4.6 Missing Values ............................... 56
3.4.7 Allelefrequencies.............................. 58
3.4.8 Rates of Self-fertilization.......................... 60
3.4.9 Rates of Clonality ............................. 60
3.5 Results........................................ 61
3.5.1 Real Microsatellite Data.......................... 61
3.5.2 Simulated Human Population....................... 62
3.5.3 SimulatedClonalPlantPopulation.................... 64
3.6 Discussion...................................... 71
4 Conclusions 77
A FRANz 79
A.1 Availability ..................................... 79
A.2 Input files...................................... 79
A.2.1 Maininputfile ............................... 79
A.2.2 Knownrelationships ............................ 80
A.2.3 Allele frequencies.............................. 81
A.2.4 Sampling locations............................. 82
A.3 Output files..................................... 83
A.4 Web 2.0 Interface.................................. 86
List of Figures 87
List of Tables 88
List Abbreviations 90
Bibliography 92
Curriculum Vitae
FRANz: reconstruction of wild multi-generation pedigrees
Summary: We present a software package for pedigree reconstruction in natural populations using co-dominant genomic markers such as microsatellites and single nucleotide polymorphisms (SNPs). If available, the algorithm makes use of prior information such as known relationships (sub-pedigrees) or the age and sex of individuals. Statistical confidence is estimated by Markov Chain Monte Carlo (MCMC) sampling. The accuracy of the algorithm is demonstrated for simulated data as well as an empirical dataset with known pedigree. The parentage inference is robust even in the presence of genotyping errors
Cross-study validation for the assessment of prediction algorithms
Motivation: Numerous competing algorithms for prediction in high-dimensional settings have been developed in the statistical and machine-learning literature. Learning algorithms and the prediction models they generate are typically evaluated on the basis of cross-validation error estimates in a few exemplary datasets. However, in most applications, the ultimate goal of prediction modeling is to provide accurate predictions for independent samples obtained in different settings. Cross-validation within exemplary datasets may not adequately reflect performance in the broader application context. Methods: We develop and implement a systematic approach to ‘cross-study validation’, to replace or supplement conventional cross-validation when evaluating high-dimensional prediction models in independent datasets. We illustrate it via simulations and in a collection of eight estrogen-receptor positive breast cancer microarray gene-expression datasets, where the objective is predicting distant metastasis-free survival (DMFS). We computed the C-index for all pairwise combinations of training and validation datasets. We evaluate several alternatives for summarizing the pairwise validation statistics, and compare these to conventional cross-validation. Results: Our data-driven simulations and our application to survival prediction with eight breast cancer microarray datasets, suggest that standard cross-validation produces inflated discrimination accuracy for all algorithms considered, when compared to cross-study validation. Furthermore, the ranking of learning algorithms differs, suggesting that algorithms performing best in cross-validation may be suboptimal when evaluated through independent validation. Availability: The survHD: Survival in High Dimensions package (http://www.bitbucket.org/lwaldron/survhd) will be made available through Bioconductor. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online
Recommended from our members
Identification of Nine Genomic Regions of Amplification in Urothelial Carcinoma, Correlation with Stage, and Potential Prognostic and Therapeutic Value
We performed a genome wide analysis of 164 urothelial carcinoma samples and 27 bladder cancer cell lines to identify copy number changes associated with disease characteristics, and examined the association of amplification events with stage and grade of disease. Multiplex inversion probe (MIP) analysis, a recently developed genomic technique, was used to study 80 urothelial carcinomas to identify mutations and copy number changes. Selected amplification events were then analyzed in a validation cohort of 84 bladder cancers by multiplex ligation-dependent probe assay (MLPA). In the MIP analysis, 44 regions of significant copy number change were identified using GISTIC. Nine gene-containing regions of amplification were selected for validation in the second cohort by MLPA. Amplification events at these 9 genomic regions were found to correlate strongly with stage, being seen in only 2 of 23 (9%) Ta grade 1 or 1–2 cancers, in contrast to 31 of 61 (51%) Ta grade 3 and T2 grade 2 cancers, p<0.001. These observations suggest that analysis of genomic amplification of these 9 regions might help distinguish non-invasive from invasive urothelial carcinoma, although further study is required. Both MIP and MLPA methods perform well on formalin-fixed paraffin-embedded DNA, enhancing their potential clinical use. Furthermore several of the amplified genes identified here (ERBB2, MDM2, CCND1) are potential therapeutic targets
ESTs and EST-linked polymorphisms for genetic mapping and phylogenetic reconstruction in the guppy, Poecilia reticulata
<p>Abstract</p> <p>Background</p> <p>The guppy, <it>Poecilia reticulata</it>, is a well-known model organism for studying inheritance and variation of male ornamental traits as well as adaptation to different river habitats. However, genomic resources for studying this important model were not previously widely available.</p> <p>Results</p> <p>With the aim of generating molecular markers for genetic mapping of the guppy, cDNA libraries were constructed from embryos and different adult organs to generate expressed sequence tags (ESTs). About 18,000 ESTs were annotated according to BLASTN and BLASTX results and the sequence information from the 3' UTRs was exploited to generate PCR primers for re-sequencing of genomic DNA from different wild type strains. By comparison of EST-linked genomic sequences from at least four different ecotypes, about 1,700 polymorphisms were identified, representing about 400 distinct genes. Two interconnected MySQL databases were built to organize the ESTs and markers, respectively. A robust phylogeny of the guppy was reconstructed, based on 10 different nuclear genes.</p> <p>Conclusion</p> <p>Our EST and marker databases provide useful tools for genetic mapping and phylogenetic studies of the guppy.</p
curatedOvarianData: clinically annotated data for the ovarian cancer transcriptome
This article introduces a manually curated data collection for gene expression meta-analysis of patients with ovarian cancer and software for reproducible preparation of similar databases. This resource provides uniformly prepared microarray data for 2970 patients from 23 studies with curated and documented clinical metadata. It allows users to efficiently identify studies and patient subgroups of interest for analysis and to perform meta-analysis immediately without the challenges posed by harmonizing heterogeneous microarray technologies, study designs, expression data processing methods and clinical data formats. We confirm that the recently proposed biomarker CXCL12 is associated with patient survival, independently of stage and optimal surgical debulking, which was possible only through meta-analysis owing to insufficient sample sizes of the individual studies. The database is implemented as the curatedOvarianData Bioconductor package for the R statistical computing language, providing a comprehensive and flexible resource for clinically oriented investigation of the ovarian cancer transcriptome. The package and pipeline for producing it are available from http://bcb.dfci.harvard.edu/ovariancancer. Database URL: http://bcb.dfci.harvard.edu/ovariancance
Recommended from our members
Aberration in DNA Methylation in B-Cell Lymphomas Has a Complex Origin and Increases with Disease Severity
Despite mounting evidence that epigenetic abnormalities play a key role in cancer biology, their contributions to the malignant phenotype remain poorly understood. Here we studied genome-wide DNA methylation in normal B-cell populations and subtypes of B-cell non-Hodgkin lymphoma: follicular lymphoma and diffuse large B-cell lymphomas. These lymphomas display striking and progressive intra-tumor heterogeneity and also inter-patient heterogeneity in their cytosine methylation patterns. Epigenetic heterogeneity is initiated in normal germinal center B-cells, increases markedly with disease aggressiveness, and is associated with unfavorable clinical outcome. Moreover, patterns of abnormal methylation vary depending upon chromosomal regions, gene density and the status of neighboring genes. DNA methylation abnormalities arise via two distinct processes: i) lymphomagenic transcriptional regulators perturb promoter DNA methylation in a target gene-specific manner, and ii) aberrant epigenetic states tend to spread to neighboring promoters in the absence of CTCF insulator binding sites
Genomic profiling of NETs : A comprehensive analysis of the RADIANT trials
Neuroendocrine tumors (NETs) have historically been subcategorized according to histologic features and the site of anatomic origin. Here, we characterize the genomic alterations in patients enrolled in three phase 3 clinical trials of NET of different anatomic origins and assess the potential correlation with clinical outcomes. Whole-exome and targeted panel sequencing was used to characterize 225 NET samples collected in the RADIANT series of clinical trials. Genomic profiling of NET was analyzed along with nongenomic biomarker data on the tumor grade and circulating chromogranin A (CgA) and neuron-specific enolase (NSE) levels from these patients enrolled in clinical trials. Our results highlight recurrent large-scale chromosomal alterations as a common theme among NET. Although the specific pattern of chromosomal alterations differed between tumor subtypes, the evidence for generalized chromosomal instability (CIN) was observed across all primary sites of NET. In pancreatic NET, although the P value was not significant, higher CIN suggests a trend toward longer survival (HR, 0.55, P = 0.077), whereas in the gastrointestinal NET, lower CIN was associated with longer survival (HR, 0.44, P = 0.0006). Our multivariate analyses demonstrated that when combined with other clinical data among patients with progressive advanced NETs, chromosomal level alteration adds important prognostic information. Large-scale CIN is a common feature of NET, and specific patterns of chromosomal gain and loss appeared to have independent prognostic value in NET subtypes. However, whether CIN in general has clinical significance in NET requires validation in larger patient cohort and warrants further mechanistic studies
Recommended from our members
FGFR3 expression in primary and metastatic urothelial carcinoma of the bladder
While fibroblast growth factor receptor 3 (FGFR3) is frequently mutated or overexpressed in nonmuscle-invasive urothelial carcinoma (UC), the prevalence of FGFR3 protein expression and mutation remains unknown in muscle-invasive disease. FGFR3 protein and mRNA expression, mutational status, and copy number variation were retrospectively analyzed in 231 patients with formalin-fixed paraffin-embedded primary UCs, 33 metastases, and 14 paired primary and metastatic tumors using the following methods: immunohistochemistry, NanoString nCounterTM, OncoMap or Affymetrix OncoScanTM array, and Gain and Loss of Analysis of DNA and Genomic Identification of Significant Targets in Cancer software. FGFR3 immunohistochemistry staining was present in 29% of primary UCs and 49% of metastases and did not impact overall survival (P = 0.89, primary tumors; P = 0.78, metastases). FGFR3 mutations were observed in 2% of primary tumors and 9% of metastases. Mutant tumors expressed higher levels of FGFR3 mRNA than wild-type tumors (P < 0.001). FGFR3 copy number gain and loss were rare events in primary and metastatic tumors (0.8% each; 3.0% and 12.3%, respectively). FGFR3 immunohistochemistry staining is present in one third of primary muscle-invasive UCs and half of metastases, while FGFR3 mutations and copy number changes are relatively uncommon
Molecular analysis of a male breast cancer patient with prolonged stable disease under mTOR/PI3K inhibitors BEZ235/everolimus
The mTORC1 inhibitor everolimus (Afinitor/RAD001) has been approved for multiple cancer indications, including ER(+)/HER2(-) metastatic breast cancer. However, the combination of everolimus with the dual PI3K/mTOR inhibitor BEZ235 was shown to be more efficacious than either everolimus or BEZ235 alone in preclinical models. Herein, we describe a male breast cancer (MBC) patient who was diagnosed with hormone receptor-positive (HR(+))/HER2(-) stage IIIA invasive ductal carcinoma and sequentially treated with chemoradiotherapy and hormonal therapy. Upon the development of metastases, the patient began a 200 mg twice-daily BEZ235 and 2.5 mg weekly everolimus combination regimen. The patient sustained a prolonged stable disease of 18 mo while undergoing the therapy, before his tumor progressed again. Therefore, we sought to both better understand MBC and investigate the underlying molecular mechanisms of the patient's sensitivity and subsequent resistance to the BEZ235/everolimus combination therapy. Genomic and immunohistochemical analyses were performed on samples collected from the initial invasive ductal carcinoma pretreatment and a metastasis postprogression on the BEZ235/everolimus combination treatment. Both tumors were relatively quiet genomically with no overlap to recurrent MBC alterations in the literature. Markers of PI3K/mTOR pathway hyperactivation were not identified in the pretreatment sample, which complements previous reports of HR(+) female breast cancers being responsive to mTOR inhibition without this activation. The postprogression sample, however, demonstrated greater than fivefold increased estrogen receptor and pathogenesis-related protein expression, which could have constrained the PI3K/mTOR pathway inhibition by BEZ235/everolimus. Overall, these analyses have augmented the limited episteme on MBC genetics and treatment
- …