90 research outputs found

    A machine learning pipeline for quantitative phenotype prediction from genotype data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Quantitative phenotypes emerge everywhere in systems biology and biomedicine due to a direct interest for quantitative traits, or to high individual variability that makes hard or impossible to classify samples into distinct categories, often the case with complex common diseases. Machine learning approaches to genotype-phenotype mapping may significantly improve Genome-Wide Association Studies (GWAS) results by explicitly focusing on predictivity and optimal feature selection in a multivariate setting. It is however essential that stringent and well documented Data Analysis Protocols (DAP) are used to control sources of variability and ensure reproducibility of results. We present a genome-to-phenotype pipeline of machine learning modules for quantitative phenotype prediction. The pipeline can be applied for the direct use of whole-genome information in functional studies. As a realistic example, the problem of fitting complex phenotypic traits in heterogeneous stock mice from single nucleotide polymorphims (SNPs) is here considered.</p> <p>Methods</p> <p>The core element in the pipeline is the L1L2 regularization method based on the naïve elastic net. The method gives at the same time a regression model and a dimensionality reduction procedure suitable for correlated features. Model and SNP markers are selected through a DAP originally developed in the MAQC-II collaborative initiative of the U.S. FDA for the identification of clinical biomarkers from microarray data. The L1L2 approach is compared with standard Support Vector Regression (SVR) and with Recursive Jump Monte Carlo Markov Chain (MCMC). Algebraic indicators of stability of partial lists are used for model selection; the final panel of markers is obtained by a procedure at the chromosome scale, termed ’saturation’, to recover SNPs in Linkage Disequilibrium with those selected.</p> <p>Results</p> <p>With respect to both MCMC and SVR, comparable accuracies are obtained by the L1L2 pipeline. Good agreement is also found between SNPs selected by the L1L2 algorithms and candidate loci previously identified by a standard GWAS. The combination of L1L2-based feature selection with a saturation procedure tackles the issue of neglecting highly correlated features that affects many feature selection algorithms.</p> <p>Conclusions</p> <p>The L1L2 pipeline has proven effective in terms of marker selection and prediction accuracy. This study indicates that machine learning techniques may support quantitative phenotype prediction, provided that adequate DAPs are employed to control bias in model selection.</p

    Sprouty Proteins Inhibit Receptor-mediated Activation of Phosphatidylinositol-specific Phospholipase C

    Get PDF
    PLCγ03B3 binds Spry1 and Spry2. Overexpression of Spry decreased PLCγ03B3 activity and IP3 and DAG production, whereas Spry-deficient cells yielded more IP3. Spry overexpression inhibited T-cell receptor signaling and Spry1 null T-cells hyperproliferated with TCR ligation. Through action of PLCγ03B3, Spry may influence signaling through multiple receptors

    The Impact of Multifunctional Genes on "Guilt by Association" Analysis

    Get PDF
    Many previous studies have shown that by using variants of “guilt-by-association”, gene function predictions can be made with very high statistical confidence. In these studies, it is assumed that the “associations” in the data (e.g., protein interaction partners) of a gene are necessary in establishing “guilt”. In this paper we show that multifunctionality, rather than association, is a primary driver of gene function prediction. We first show that knowledge of the degree of multifunctionality alone can produce astonishingly strong performance when used as a predictor of gene function. We then demonstrate how multifunctionality is encoded in gene interaction data (such as protein interactions and coexpression networks) and how this can feed forward into gene function prediction algorithms. We find that high-quality gene function predictions can be made using data that possesses no information on which gene interacts with which. By examining a wide range of networks from mouse, human and yeast, as well as multiple prediction methods and evaluation metrics, we provide evidence that this problem is pervasive and does not reflect the failings of any particular algorithm or data type. We propose computational controls that can be used to provide more meaningful control when estimating gene function prediction performance. We suggest that this source of bias due to multifunctionality is important to control for, with widespread implications for the interpretation of genomics studies

    A mRNA landscape of bovine embryos after standard and MAPK-inhibited culture conditions: a comparative analysis.

    Get PDF
    BACKGROUND: Genes and signalling pathways involved in pluripotency have been studied extensively in mouse and human pre-implantation embryos and embryonic stem (ES) cells. The unsuccessful attempts to generate ES cell lines from other species including cattle suggests that other genes and pathways are involved in maintaining pluripotency in these species. To investigate which genes are involved in bovine pluripotency, expression profiles were generated from morula, blastocyst, trophectoderm and inner cell mass (ICM) samples using microarray analysis. As MAPK inhibition can increase the NANOG/GATA6 ratio in the inner cell mass, additionally blastocysts were cultured in the presence of a MAPK inhibitor and changes in gene expression in the inner cell mass were analysed. RESULTS: Between morula and blastocyst 3,774 genes were differentially expressed and the largest differences were found in blastocyst up-regulated genes. Gene ontology (GO) analysis shows lipid metabolic process as the term most enriched with genes expressed at higher levels in blastocysts. Genes with higher expression levels in morulae were enriched in the RNA processing GO term. Of the 497 differentially expressed genes comparing ICM and TE, the expression of NANOG, SOX2 and POU5F1 was increased in the ICM confirming their evolutionary preserved role in pluripotency. Several genes implicated to be involved in differentiation or fate determination were also expressed at higher levels in the ICM. Genes expressed at higher levels in the ICM were enriched in the RNA splicing and regulation of gene expression GO term. Although NANOG expression was elevated upon MAPK inhibition, SOX2 and POU5F1 expression showed little increase. Expression of other genes in the MAPK pathway including DUSP4 and SPRY4, or influenced by MAPK inhibition such as IFNT, was down-regulated. CONCLUSION: The data obtained from the microarray studies provide further insight in gene expression during bovine embryonic development. They show an expression profile in pluripotent cells that indicates a pluripotent, epiblast-like state. The inability to culture ICM cells as stem cells in the presence of an inhibitor of MAPK activity together with the reported data indicates that MAPK inhibition alone is not sufficient to maintain a pluripotent character in bovine cells

    Drosophila cbl Is Essential for Control of Cell Death and Cell Differentiation during Eye Development

    Get PDF
    Activation of cell surface receptors transduces extracellular signals into cellular responses such as proliferation, differentiation and survival. However, as important as the activation of these receptors is their appropriate spatial and temporal down-regulation for normal development and tissue homeostasis. The Cbl family of E3-ubiquitin ligases plays a major role for the ligand-dependent inactivation of receptor tyrosine kinases (RTKs), most notably the Epidermal Growth Factor Receptor (EGFR) through ubiquitin-mediated endocytosis and lysosomal degradation.Here, we report the mutant phenotypes of Drosophila cbl (D-cbl) during eye development. D-cbl mutants display overgrowth, inhibition of apoptosis, differentiation defects and increased ommatidial spacing. Using genetic interaction and molecular markers, we show that most of these phenotypes are caused by increased activity of the Drosophila EGFR. Our genetic data also indicate a critical role of ubiquitination for D-cbl function, consistent with biochemical models.These data may provide a mechanistic model for the understanding of the oncogenic activity of mammalian cbl genes

    Negative feedback regulation of the ERK1/2 MAPK pathway

    Get PDF
    The extracellular signal-regulated kinase 1/2 (ERK1/2) mitogen-activated protein kinase (MAPK) signalling pathway regulates many cellular functions, including proliferation, differentiation, and transformation. To reliably convert external stimuli into specific cellular responses and to adapt to environmental circumstances, the pathway must be integrated into the overall signalling activity of the cell. Multiple mechanisms have evolved to perform this role. In this review, we will focus on negative feedback mechanisms and examine how they shape ERK1/2 MAPK signalling. We will first discuss the extensive number of negative feedback loops targeting the different components of the ERK1/2 MAPK cascade, specifically the direct posttranslational modification of pathway components by downstream protein kinases and the induction of de novo gene synthesis of specific pathway inhibitors. We will then evaluate how negative feedback modulates the spatiotemporal signalling dynamics of the ERK1/2 pathway regarding signalling amplitude and duration as well as subcellular localisation. Aberrant ERK1/2 activation results in deregulated proliferation and malignant transformation in model systems and is commonly observed in human tumours. Inhibition of the ERK1/2 pathway thus represents an attractive target for the treatment of malignant tumours with increased ERK1/2 activity. We will, therefore, discuss the effect of ERK1/2 MAPK feedback regulation on cancer treatment and how it contributes to reduced clinical efficacy of therapeutic agents and the development of drug resistance

    Integrating Diverse Datasets Improves Developmental Enhancer Prediction

    Get PDF
    Gene-regulatory enhancers have been identified using various approaches, including evolutionary conservation, regulatory protein binding, chromatin modifications, and DNA sequence motifs. To integrate these different approaches, we developed EnhancerFinder, a two-step method for distinguishing developmental enhancers from the genomic background and then predicting their tissue specificity. EnhancerFinder uses a multiple kernel learning approach to integrate DNA sequence motifs, evolutionary patterns, and diverse functional genomics datasets from a variety of cell types. In contrast with prediction approaches that define enhancers based on histone marks or p300 sites from a single cell line, we trained EnhancerFinder on hundreds of experimentally verified human developmental enhancers from the VISTA Enhancer Browser. We comprehensively evaluated EnhancerFinder using cross validation and found that our integrative method improves the identification of enhancers over approaches that consider a single type of data, such as sequence motifs, evolutionary conservation, or the binding of enhancer-associated proteins. We find that VISTA enhancers active in embryonic heart are easier to identify than enhancers active in several other embryonic tissues, likely due to their uniquely high GC content. We applied EnhancerFinder to the entire human genome and predicted 84,301 developmental enhancers and their tissue specificity. These predictions provide specific functional annotations for large amounts of human non-coding DNA, and are significantly enriched near genes with annotated roles in their predicted tissues and lead SNPs from genome-wide association studies. We demonstrate the utility of EnhancerFinder predictions through in vivo validation of novel embryonic gene regulatory enhancers from three developmental transcription factor loci. Our genome-wide developmental enhancer predictions are freely available as a UCSC Genome Browser track, which we hope will enable researchers to further investigate questions in developmental biology. © 2014 Erwin et al

    Paleotemperature Proxies from Leaf Fossils Reinterpreted in Light of Evolutionary History

    Get PDF
    Present-day correlations between leaf physiognomic traits (shape and size) and climate are widely used to estimate paleoclimate using fossil floras. For example, leaf-margin analysis estimates paleotemperature using the modern relation of mean annual temperature (MAT) and the site-proportion of untoothed-leaf species (NT). This uniformitarian approach should provide accurate paleoclimate reconstructions under the core assumption that leaf-trait variation principally results from adaptive environmental convergence, and because variation is thus largely independent of phylogeny it should be constant through geologic time. Although much research acknowledges and investigates possible pitfalls in paleoclimate estimation based on leaf physiognomy, the core assumption has never been explicitly tested in a phylogenetic comparative framework. Combining an extant dataset of 21 leaf traits and temperature with a phylogenetic hypothesis for 569 species-site pairs at 17 sites, we found varying amounts of non-random phylogenetic signal in all traits. Phylogenetic vs. standard regressions generally support prevailing ideas that leaf-traits are adaptively responding to temperature, but wider confidence intervals, and shifts in slope and intercept, indicate an overall reduced ability to predict climate precisely due to the non-random phylogenetic signal. Notably, the modern-day relation of proportion of untoothed taxa with mean annual temperature (NT-MAT), central in paleotemperature inference, was greatly modified and reduced, indicating that the modern correlation primarily results from biogeographic history. Importantly, some tooth traits, such as number of teeth, had similar or steeper slopes after taking phylogeny into account, suggesting that leaf teeth display a pattern of exaptive evolution in higher latitudes. This study shows that the assumption of convergence required for precise, quantitative temperature estimates using present-day leaf traits is not supported by empirical evidence, and thus we have very low confidence in previously published, numerical paleotemperature estimates. However, interpreting qualitative changes in paleotemperature remains warranted, given certain conditions such as stratigraphically closely-spaced samples with floristic continuity

    Functional roles of fibroblast growth factor receptors (FGFRs) signaling in human cancers

    Full text link
    corecore