197 research outputs found

    Stability analysis of mixtures of mutagenetic trees

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Mixture models of mutagenetic trees are evolutionary models that capture several pathways of ordered accumulation of genetic events observed in different subsets of patients. They were used to model HIV progression by accumulation of resistance mutations in the viral genome under drug pressure and cancer progression by accumulation of chromosomal aberrations in tumor cells. From the mixture models a genetic progression score (GPS) can be derived that estimates the genetic status of single patients according to the corresponding progression along the tree models. GPS values were shown to have predictive power for estimating drug resistance in HIV or the survival time in cancer. Still, the reliability of the exact values of such complex markers derived from graphical models can be questioned.</p> <p>Results</p> <p>In a simulation study, we analyzed various aspects of the stability of estimated mutagenetic trees mixture models. It turned out that the induced probabilistic distributions and the tree topologies are recovered with high precision by an EM-like learning algorithm. However, only for models with just one major model component, also GPS values of single patients can be reliably estimated.</p> <p>Conclusion</p> <p>It is encouraging that the estimation process of mutagenetic trees mixture models can be performed with high confidence regarding induced probability distributions and the general shape of the tree topologies. For a model with only one major disease progression process, even genetic progression scores for single patients can be reliably estimated. However, for models with more than one relevant component, alternative measures should be introduced for estimating the stage of disease progression.</p

    Binary classification of dyslipidemia from the waist-to-hip ratio and body mass index: a comparison of linear, logistic, and CART models

    Get PDF
    BACKGROUND: We sought to improve upon previously published statistical modeling strategies for binary classification of dyslipidemia for general population screening purposes based on the waist-to-hip circumference ratio and body mass index anthropometric measurements. METHODS: Study subjects were participants in WHO-MONICA population-based surveys conducted in two Swiss regions. Outcome variables were based on the total serum cholesterol to high density lipoprotein cholesterol ratio. The other potential predictor variables were gender, age, current cigarette smoking, and hypertension. The models investigated were: (i) linear regression; (ii) logistic classification; (iii) regression trees; (iv) classification trees (iii and iv are collectively known as "CART"). Binary classification performance of the region-specific models was externally validated by classifying the subjects from the other region. RESULTS: Waist-to-hip circumference ratio and body mass index remained modest predictors of dyslipidemia. Correct classification rates for all models were 60–80%, with marked gender differences. Gender-specific models provided only small gains in classification. The external validations provided assurance about the stability of the models. CONCLUSIONS: There were no striking differences between either the algebraic (i, ii) vs. non-algebraic (iii, iv), or the regression (i, iii) vs. classification (ii, iv) modeling approaches. Anticipated advantages of the CART vs. simple additive linear and logistic models were less than expected in this particular application with a relatively small set of predictor variables. CART models may be more useful when considering main effects and interactions between larger sets of predictor variables

    Large-scale risk prediction applied to Genetic Analysis Workshop 17 mini-exome sequence data

    Get PDF
    We consider the application of Efron’s empirical Bayes classification method to risk prediction in a genome-wide association study using the Genetic Analysis Workshop 17 (GAW17) data. A major advantage of using this method is that the effect size distribution for the set of possible features is empirically estimated and that all subsequent parameter estimation and risk prediction is guided by this distribution. Here, we generalize Efron’s method to allow for some of the peculiarities of the GAW17 data. In particular, we introduce two ways to extend Efron’s model: a weighted empirical Bayes model and a joint covariance model that allows the model to properly incorporate the annotation information of single-nucleotide polymorphisms (SNPs). In the course of our analysis, we examine several aspects of the possible simulation model, including the identity of the most important genes, the differing effects of synonymous and nonsynonymous SNPs, and the relative roles of covariates and genes in conferring disease risk. Finally, we compare the three methods to each other and to other classifiers (random forest and neural network)

    Threat-sensitive anti-predator defence in precocial wader, the northern lapwing Vanellus vanellus

    Get PDF
    Birds exhibit various forms of anti-predator behaviours to avoid reproductive failure, with mobbing—observation, approach and usually harassment of a predator—being one of the most commonly observed. Here, we investigate patterns of temporal variation in the mobbing response exhibited by a precocial species, the northern lapwing (Vanellus vanellus). We test whether brood age and self-reliance, or the perceived risk posed by various predators, affect mobbing response of lapwings. We quantified aggressive interactions between lapwings and their natural avian predators and used generalized additive models to test how timing and predator species identity are related to the mobbing response of lapwings. Lapwings diversified mobbing response within the breeding season and depending on predator species. Raven Corvus corax, hooded crow Corvus cornix and harriers evoked the strongest response, while common buzzard Buteo buteo, white stork Ciconia ciconia, black-headed gull Chroicocephalus ridibundus and rook Corvus frugilegus were less frequently attacked. Lapwings increased their mobbing response against raven, common buzzard, white stork and rook throughout the breeding season, while defence against hooded crow, harriers and black-headed gull did not exhibit clear temporal patterns. Mobbing behaviour of lapwings apparently constitutes a flexible anti-predator strategy. The anti-predator response depends on predator species, which may suggest that lapwings distinguish between predator types and match mobbing response to the perceived hazard at different stages of the breeding cycle. We conclude that a single species may exhibit various patterns of temporal variation in anti-predator defence, which may correspond with various hypotheses derived from parental investment theory

    Genomic prediction in CIMMYT maize and wheat breeding programs

    Get PDF
    Genomic selection (GS) has been implemented in animal and plant species, and is regarded as a useful tool for accelerating genetic gains. Varying levels of genomic prediction accuracy have been obtained in plants, depending on the prediction problem assessed and on several other factors, such as trait heritability, the relationship between the individuals to be predicted and those used to train the models for prediction, number of markers, sample size and genotype × environment interaction (GE). The main objective of this article is to describe the results of genomic prediction in International Maize and Wheat Improvement Center's (CIMMYT's) maize and wheat breeding programs, from the initial assessment of the predictive ability of different models using pedigree and marker information to the present, when methods for implementing GS in practical global maize and wheat breeding programs are being studied and investigated. Results show that pedigree (population structure) accounts for a sizeable proportion of the prediction accuracy when a global population is the prediction problem to be assessed. However, when the prediction uses unrelated populations to train the prediction equations, prediction accuracy becomes negligible. When genomic prediction includes modeling GE, an increase in prediction accuracy can be achieved by borrowing information from correlated environments. Several questions on how to incorporate GS into CIMMYT's maize and wheat programs remain unanswered and subject to further investigation, for example, prediction within and between related bi-parental crosses. Further research on the quantification of breeding value components for GS in plant breeding populations is required.J Crossa, P PĂ©rez, J Hickey, J Burgueño, L Ornella, J CerĂłn-Rojas, X Zhang, S Dreisigacker, R Babu, Y Li, D Bonnett and K Mathew

    Grammatical evolution decision trees for detecting gene-gene interactions

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>A fundamental goal of human genetics is the discovery of polymorphisms that predict common, complex diseases. It is hypothesized that complex diseases are due to a myriad of factors including environmental exposures and complex genetic risk models, including gene-gene interactions. Such epistatic models present an important analytical challenge, requiring that methods perform not only statistical modeling, but also variable selection to generate testable genetic model hypotheses. This challenge is amplified by recent advances in genotyping technology, as the number of potential predictor variables is rapidly increasing.</p> <p>Methods</p> <p>Decision trees are a highly successful, easily interpretable data-mining method that are typically optimized with a hierarchical model building approach, which limits their potential to identify interacting effects. To overcome this limitation, we utilize evolutionary computation, specifically grammatical evolution, to build decision trees to detect and model gene-gene interactions. In the current study, we introduce the Grammatical Evolution Decision Trees (GEDT) method and software and evaluate this approach on simulated data representing gene-gene interaction models of a range of effect sizes. We compare the performance of the method to a traditional decision tree algorithm and a random search approach and demonstrate the improved performance of the method to detect purely epistatic interactions.</p> <p>Results</p> <p>The results of our simulations demonstrate that GEDT has high power to detect even very moderate genetic risk models. GEDT has high power to detect interactions with and without main effects.</p> <p>Conclusions</p> <p>GEDT, while still in its initial stages of development, is a promising new approach for identifying gene-gene interactions in genetic association studies.</p

    Determining Frequent Patterns of Copy Number Alterations in Cancer

    Get PDF
    Cancer progression is often driven by an accumulation of genetic changes but also accompanied by increasing genomic instability. These processes lead to a complicated landscape of copy number alterations (CNAs) within individual tumors and great diversity across tumor samples. High resolution array-based comparative genomic hybridization (aCGH) is being used to profile CNAs of ever larger tumor collections, and better computational methods for processing these data sets and identifying potential driver CNAs are needed. Typical studies of aCGH data sets take a pipeline approach, starting with segmentation of profiles, calls of gains and losses, and finally determination of frequent CNAs across samples. A drawback of pipelines is that choices at each step may produce different results, and biases are propagated forward. We present a mathematically robust new method that exploits probe-level correlations in aCGH data to discover subsets of samples that display common CNAs. Our algorithm is related to recent work on maximum-margin clustering. It does not require pre-segmentation of the data and also provides grouping of recurrent CNAs into clusters. We tested our approach on a large cohort of glioblastoma aCGH samples from The Cancer Genome Atlas and recovered almost all CNAs reported in the initial study. We also found additional significant CNAs missed by the original analysis but supported by earlier studies, and we identified significant correlations between CNAs

    Cross-validated stepwise regression for identification of novel non-nucleoside reverse transcriptase inhibitor resistance associated mutations

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Linear regression models are used to quantitatively predict drug resistance, the phenotype, from the HIV-1 viral genotype. As new antiretroviral drugs become available, new resistance pathways emerge and the number of resistance associated mutations continues to increase. To accurately identify which drug options are left, the main goal of the modeling has been to maximize predictivity and not interpretability. However, we originally selected linear regression as the preferred method for its transparency as opposed to other techniques such as neural networks. Here, we apply a method to lower the complexity of these phenotype prediction models using a 3-fold cross-validated selection of mutations.</p> <p>Results</p> <p>Compared to standard stepwise regression we were able to reduce the number of mutations in the reverse transcriptase (RT) inhibitor models as well as the number of interaction terms accounting for synergistic and antagonistic effects. This reduction in complexity was most significant for the non-nucleoside reverse transcriptase inhibitor (NNRTI) models, while maintaining prediction accuracy and retaining virtually all known resistance associated mutations as first order terms in the models. Furthermore, for etravirine (ETR) a better performance was seen on two years of unseen data. By analyzing the phenotype prediction models we identified a list of forty novel NNRTI mutations, putatively associated with resistance. The resistance association of novel variants at known NNRTI resistance positions: 100, 101, 181, 190, 221 and of mutations at positions not previously linked with NNRTI resistance: 102, 139, 219, 241, 376 and 382 was confirmed by phenotyping site-directed mutants.</p> <p>Conclusions</p> <p>We successfully identified and validated novel NNRTI resistance associated mutations by developing parsimonious resistance prediction models in which repeated cross-validation within the stepwise regression was applied. Our model selection technique is computationally feasible for large data sets and provides an approach to the continued identification of resistance-causing mutations.</p

    Predicting global invasion risks: a management tool to prevent future introductions

    Get PDF
    Predicting regions at risk from introductions of non-native species and the subsequent invasions is a fundamental aspect of horizon scanning activities that enable the development of more effective preventative actions and planning of management measures. The Asian cyprinid fish topmouth gudgeon Pseudorasbora parva has proved highly invasive across Europe since its introduction in the 1960s. In addition to direct negative impacts on native fish populations, P. parva has potential for further damage through transmission of an emergent infectious disease, known to cause mortality in other species. To quantify its invasion risk, in regions where it has yet to be introduced, we trained 900 ecological niche models and constructed an Ensemble Model predicting suitability, then integrated a proxy for introduction likelihood. This revealed high potential for P. parva to invade regions well beyond its current invasive range. These included areas in all modelled continents, with several hotspots of climatic suitability and risk of introduction. We believe that these methods are easily adapted for a variety of other invasive species and that such risk maps could be used by policy-makers and managers in hotspots to formulate increased surveillance and early-warning systems that aim to prevent introductions and subsequent invasions
    • 

    corecore