68 research outputs found

    Every which way? On predicting tumor evolution using cancer progression models

    Full text link
    Successful prediction of the likely paths of tumor progression is valuable for diagnostic, prognostic, and treatment purposes. Cancer progression models (CPMs) use cross-sectional samples to identify restrictions in the order of accumulation of driver mutations and thus CPMs encode the paths of tumor progression. Here we analyze the performance of four CPMs to examine whether they can be used to predict the true distribution of paths of tumor progression and to estimate evolutionary unpredictability. Employing simulations we show that if fitness landscapes are single peaked (have a single fitness maximum) there is good agreement between true and predicted distributions of paths of tumor progression when sample sizes are large, but performance is poor with the currently common much smaller sample sizes. Under multi-peaked fitness landscapes (i.e., those with multiple fitness maxima), performance is poor and improves only slightly with sample size. In all cases, detection regime (when tumors are sampled) is a key determinant of performance. Estimates of evolutionary unpredictability from the best performing CPM, among the four examined, tend to overestimate the true unpredictability and the bias is affected by detection regime; CPMs could be useful for estimating upper bounds to the true evolutionary unpredictability. Analysis of twenty-two cancer data sets shows low evolutionary unpredictability for several of the data sets. But most of the predictions of paths of tumor progression are very unreliable, and unreliability increases with the number of features analyzed. Our results indicate that CPMs could be valuable tools for predicting cancer progression but that, currently, obtaining useful predictions of paths of tumor progression from CPMs is dubious, and emphasize the need for methodological work that can account for the probably multi-peaked fitness landscapes in cancerWork partially supported by BFU2015- 67302-R (MINECO/FEDER, EU) to RDU. CV supported by PEJD-2016-BMD-2116 from Comunidad de Madrid to RD

    OncoSimulR: Genetic simulation with arbitrary epistasis and mutator genes in asexual populations

    Full text link
    OncoSimulR implements forward-time genetic simulations of biallelic loci in asexual populations with special focus on cancer progression. Fitness can be defined as an arbitrary function of genetic interactions between multiple genes or modules of genes, including epistasis, restrictions in the order of accumulation of mutations, and order effects. Mutation rates can differ among genes, and can be affected by (anti)mutator genes. Also available are sampling from simulations (including single-cell sampling), plotting the genealogical relationships of clones and generating and plotting fitness landscapesSupported by BFU2015-67302-R (MINECO/FEDER, EU

    Covariation of life-history traits in lacertid lizards: A comparative study

    Full text link
    We analyzed patterns of life-history covariation within a clade of lacertid lizards, using the method of phylogenetically independent contrasts. Examination of allometric relations and correlations among life-history traits showed that species within this clade can be arranged along a single, multivariate axis. At one end of this continuum are small-sized species that mature early, have small clutches of relatively large young, may have multiple broods per year, and have short adult lives. At the other extreme are the larger lacertids with the opposite suite of traits. Much of this pattern can be deduced from two relations: the increase of adult life span with adult body size and the negative allometry of offspring size. After the effects of body size were statistically removed, residuals of adult life span and age at sexual maturity were positively correlated, whereas residuals of the number and size of offspring were negatively correlated. The detection of these size-free relations supports an interpretation of coadaptive adjustments among life-history variables. The pattern of life-history covariation in lacertid lizards differs fundamentally from the "fast-slow" continuum. This gradient reflects a negative association between adult life span and fecundity, whereas both variables are positively correlated among species of lacertid lizardsThis work was supported by a grant from the La Caixa Fellowship Program (R.D.-U.

    Gene selection and classification of microarray data using random forest

    Get PDF
    BACKGROUND: Selection of relevant genes for sample classification is a common task in most gene expression studies, where researchers try to identify the smallest possible set of genes that can still achieve good predictive performance (for instance, for future use with diagnostic purposes in clinical practice). Many gene selection approaches use univariate (gene-by-gene) rankings of gene relevance and arbitrary thresholds to select the number of genes, can only be applied to two-class problems, and use gene selection ranking criteria unrelated to the classification algorithm. In contrast, random forest is a classification algorithm well suited for microarray data: it shows excellent performance even when most predictive variables are noise, can be used when the number of variables is much larger than the number of observations and in problems involving more than two classes, and returns measures of variable importance. Thus, it is important to understand the performance of random forest with microarray data and its possible use for gene selection. RESULTS: We investigate the use of random forest for classification of microarray data (including multi-class problems) and propose a new method of gene selection in classification problems based on random forest. Using simulated and nine microarray data sets we show that random forest has comparable performance to other classification methods, including DLDA, KNN, and SVM, and that the new gene selection procedure yields very small sets of genes (often smaller than alternative methods) while preserving predictive accuracy. CONCLUSION: Because of its performance and features, random forest and gene selection using random forest should probably become part of the "standard tool-box" of methods for class prediction and gene selection with microarray data

    IDconverter and IDClight: Conversion and annotation of gene and protein IDs

    Get PDF
    Background: Researchers involved in the annotation of large numbers of gene, clone or protein identifiers are usually required to perform a one-by-one conversion for each identifier. When the field of research is one such as microarray experiments, this number may be around 30,000. Results: To help researchers map accession numbers and identifiers among clones, genes, proteins and chromosomal positions, we have designed and developed IDconverter and IDClight. They are two user-friendly, freely available web server applications that also provide additional functional information by mapping the identifiers on to pathways, Gene Ontology terms, and literature references. Both tools are high-throughput oriented and include identifiers for the most common genomic databases. These tools have been compared to other similar tools, showing that they are among the fastest and the most up-to-date. Conclusion: These tools provide a fast and intuitive way of enriching the information coming out of high-throughput experiments like microarrays. They can be valuable both to wet-lab researchers and to bioinformaticiansFunding has been provided by Fundación de Investigatión Médica Mutua Madrileña and Project TIC2003-09331-C02-02 of the Spanish Ministry of Education and Science (MEC). RD-U is partially supported by the Ramón y Cajal programme of the Spanish ME

    Flexible and Accurate Detection of Genomic Copy-Number Changes from aCGH

    Get PDF
    Genomic DNA copy-number alterations (CNAs) are associated with complex diseases, including cancer: CNAs are indeed related to tumoral grade, metastasis, and patient survival. CNAs discovered from array-based comparative genomic hybridization (aCGH) data have been instrumental in identifying disease-related genes and potential therapeutic targets. To be immediately useful in both clinical and basic research scenarios, aCGH data analysis requires accurate methods that do not impose unrealistic biological assumptions and that provide direct answers to the key question, “What is the probability that this gene/region has CNAs?” Current approaches fail, however, to meet these requirements. Here, we introduce reversible jump aCGH (RJaCGH), a new method for identifying CNAs from aCGH; we use a nonhomogeneous hidden Markov model fitted via reversible jump Markov chain Monte Carlo; and we incorporate model uncertainty through Bayesian model averaging. RJaCGH provides an estimate of the probability that a gene/region has CNAs while incorporating interprobe distance and the capability to analyze data on a chromosome or genome-wide basis. RJaCGH outperforms alternative methods, and the performance difference is even larger with noisy data and highly variable interprobe distance, both commonly found features in aCGH data. Furthermore, our probabilistic method allows us to identify minimal common regions of CNAs among samples and can be extended to incorporate expression data. In summary, we provide a rigorous statistical framework for locating genes and chromosomal regions with CNAs with potential applications to cancer and other complex human diseases

    Flexible and Accurate Detection of Genomic Copy-Number Changes from aCGH

    Get PDF
    Genomic DNA copy-number alterations (CNAs) are associated with complex diseases, including cancer: CNAs are indeed related to tumoral grade, metastasis, and patient survival. CNAs discovered from array-based comparative genomic hybridization (aCGH) data have been instrumental in identifying disease-related genes and potential therapeutic targets. To be immediately useful in both clinical and basic research scenarios, aCGH data analysis requires accurate methods that do not impose unrealistic biological assumptions and that provide direct answers to the key question, “What is the probability that this gene/region has CNAs?” Current approaches fail, however, to meet these requirements. Here, we introduce reversible jump aCGH (RJaCGH), a new method for identifying CNAs from aCGH; we use a nonhomogeneous hidden Markov model fitted via reversible jump Markov chain Monte Carlo; and we incorporate model uncertainty through Bayesian model averaging. RJaCGH provides an estimate of the probability that a gene/region has CNAs while incorporating interprobe distance and the capability to analyze data on a chromosome or genome-wide basis. RJaCGH outperforms alternative methods, and the performance difference is even larger with noisy data and highly variable interprobe distance, both commonly found features in aCGH data. Furthermore, our probabilistic method allows us to identify minimal common regions of CNAs among samples and can be extended to incorporate expression data. In summary, we provide a rigorous statistical framework for locating genes and chromosomal regions with CNAs with potential applications to cancer and other complex human diseases

    Identification of amplified and highly expressed genes in amplicons of the T-cell line huT78 detected by cDNA microarray CGH

    Get PDF
    BACKGROUND: Conventional Comparative Genomic Hybridization (CGH) has been widely used for detecting copy number alterations in cancer and for identifying regions containing candidate tumor responsible genes. Recently, several studies have shown the utility of cDNA microarray CGH for studing gene copy changes in various types of tumors. However, no such studies on T-cell lymphomas have been performed. To date T-cell lymphomas analyzed by the use of chromosome CGH have revealed only slight copy number alterations and not gene amplifications. RESULTS: In the present study, we describe the characterization of three amplicons of the T-cell line huT78 located at 2q34-q37, 8q23-q24 and 20p, where new amplified and overexpressed genes are found. The use of a cDNA microarray containing 7.657 transcripts allowed the identification of certain genes, such as BCLX, PCNA, FKBP1A, IGFBP2 and cMYC, that are amplified, highly expressed, and also contained in the amplicons on 20p and 2q. The expresion of these genes was analyzed in 39 T-cell lymphomas and 3 other T-cell lines. CONCLUSION: By the use of conventional CGH and CGH and expression cDNA microarrays we defined three amplicons in the T-cell line huT78 and identified several novel gene amplifications (BCLX, PCNA, FKBP1A, IGFBP2 and cMYC). We showed that overexpression of the amplified genes could be attributable to gene dosage. We speculate that deregulation of those genes could be important in the development of T-cell lymphomas and/or in the maintenance of T-cell lines

    Asterias: integrated analysis of expression and aCGH data using an open-source, web-based, parallelized software suite

    Get PDF
    Asterias (http://www.asterias.info) is an open-source, web-based, suite for the analysis of gene expression and aCGH data. Asterias implements validated statistical methods, and most of the applications use parallel computing, which permits taking advantage of multicore CPUs and computing clusters. Access to, and further analysis of, additional biological information and annotations (PubMed references, Gene Ontology terms, KEGG and Reactome pathways) are available either for individual genes (from clickable links in tables and figures) or sets of genes. These applications cover from array normalization to imputation and preprocessing, differential gene expression analysis, class and survival prediction and aCGH analysis. The source code is available, allowing for extention and reuse of the software. The links and analysis of additional functional information, parallelization of computation and open-source availability of the code make Asterias a unique suite that can exploit features specific to web-based environments
    corecore