10 research outputs found

    "The Clever machine"- a computational tool for dataset exploration and prediction

    Get PDF
    The purpose of my doctoral studies was to develop an algorithm for large-scale analysis of protein sets. This thesis outlines the methodology and technical work performed as well as relevant biological cases involved in creation of the core algorithm, the cleverMachine (CM), and its extensions multiCleverMachine (mCM) and cleverGO. The CM and mCM provide characterisation and classification of protein groups based on physico-chemical features, along with protein abundance and Gene Ontology annotation information, to perform an accurate data exploration. My method provides both computational and experimental scientists with a comprehensive, easy to use interface for high-throughput protein sequence screening and classification.El propósito de mis estudios doctorales era desarrollar un algoritmo para el análisis a gran escala de conjuntos de datos de proteínas. Esta tesis describe la metodología, el trabajo técnico desarrollado y los casos biológicos envueltos en la creación del algoritmo principal –el cleverMachine (CM) y sus extensiones multiCleverMachine (mCM) y cleverGO. El CM y mCM permiten la caracterización y clasificación de grupos de proteínas basados en características físico-químicas, junto con la abundancia de proteínas y la anotación de ontología de genes, para así elaborar una exploración de datos correcta. Mi método está compuesto por científicos tanto computacionales como experimentales con una interfaz amplia, fácil de usar para un monitoreo y clasificación de secuencia de proteínas de alto rendimiento

    "The Clever machine"- a computational tool for dataset exploration and prediction

    No full text
    The purpose of my doctoral studies was to develop an algorithm for large-scale analysis of protein sets. This thesis outlines the methodology and technical work performed as well as relevant biological cases involved in creation of the core algorithm, the cleverMachine (CM), and its extensions multiCleverMachine (mCM) and cleverGO. The CM and mCM provide characterisation and classification of protein groups based on physico-chemical features, along with protein abundance and Gene Ontology annotation information, to perform an accurate data exploration. My method provides both computational and experimental scientists with a comprehensive, easy to use interface for high-throughput protein sequence screening and classification.El propósito de mis estudios doctorales era desarrollar un algoritmo para el análisis a gran escala de conjuntos de datos de proteínas. Esta tesis describe la metodología, el trabajo técnico desarrollado y los casos biológicos envueltos en la creación del algoritmo principal –el cleverMachine (CM) y sus extensiones multiCleverMachine (mCM) y cleverGO. El CM y mCM permiten la caracterización y clasificación de grupos de proteínas basados en características físico-químicas, junto con la abundancia de proteínas y la anotación de ontología de genes, para así elaborar una exploración de datos correcta. Mi método está compuesto por científicos tanto computacionales como experimentales con una interfaz amplia, fácil de usar para un monitoreo y clasificación de secuencia de proteínas de alto rendimiento

    Neurodegeneration and cancer: where the disorder prevails

    Get PDF
    It has been reported that genes up-regulated in cancer are often down-regulated in neurodegenerative disorders and vice versa. The fact that apparently unrelated diseases share functional pathways suggests a link between their etiopathogenesis and the properties of molecules involved. Are there specific features that explain the exclusive association of proteins with either cancer or neurodegeneration? We performed a large-scale analysis of physico-chemical properties to understand what characteristics differentiate classes of diseases. We found that structural disorder significantly distinguishes proteins up-regulated in neurodegenerative diseases from those linked to cancer. We also observed high correlation between structural disorder and age of onset in Frontotemporal Dementia, Parkinson's and Alzheimer's diseases, which strongly supports the role of protein unfolding in neurodegenerative processes.The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013), through the European Research Council, under grant agreement RIBOMYLOME_309545 (Gian Gaetano Tartaglia), and from the Fundació La Marató de TV3 (20142731). We also acknowledge support from AGAUR (2014 SGR00685) and the Spanish Ministry of Economy and Competitiveness (BFU2014-55054-P), ‘Centro de Excelencia Severo Ochoa 2013-2017′ (SEV-2012-0208

    Neurodegeneration and cancer: where the disorder prevails

    No full text
    It has been reported that genes up-regulated in cancer are often down-regulated in neurodegenerative disorders and vice versa. The fact that apparently unrelated diseases share functional pathways suggests a link between their etiopathogenesis and the properties of molecules involved. Are there specific features that explain the exclusive association of proteins with either cancer or neurodegeneration? We performed a large-scale analysis of physico-chemical properties to understand what characteristics differentiate classes of diseases. We found that structural disorder significantly distinguishes proteins up-regulated in neurodegenerative diseases from those linked to cancer. We also observed high correlation between structural disorder and age of onset in Frontotemporal Dementia, Parkinson's and Alzheimer's diseases, which strongly supports the role of protein unfolding in neurodegenerative processes.The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013), through the European Research Council, under grant agreement RIBOMYLOME_309545 (Gian Gaetano Tartaglia), and from the Fundació La Marató de TV3 (20142731). We also acknowledge support from AGAUR (2014 SGR00685) and the Spanish Ministry of Economy and Competitiveness (BFU2014-55054-P), ‘Centro de Excelencia Severo Ochoa 2013-2017′ (SEV-2012-0208

    Protein aggregation, structural disorder and RNA-binding ability: a new approach for physico-chemical and gene ontology classification of multiple datasets

    No full text
    BACKGROUND: Comparison between multiple protein datasets requires the choice of an appropriate reference system and a number of variables to describe their differences. Here we introduce an innovative approach to discriminate multiple protein datasets (multiCM) and to measure enrichments in gene ontology terms (cleverGO) using semantic similarities. RESULTS: We illustrate the powerfulness of our approach by investigating the links between RNA-binding ability and other protein features, such as structural disorder and aggregation, in S. cerevisiae, C. elegans, M. musculus and H. sapiens. Our results are in striking agreement with available experimental evidence and unravel features that are key to understand the mechanisms regulating cellular homeostasis. CONCLUSIONS: In an intuitive way, multiCM and cleverGO provide accurate classifications of physico-chemical features and annotations of biological processes, molecular functions and cellular components, which is extremely useful for the discovery and characterization of new trends in protein datasets. The multiCM and cleverGO can be freely accessed on the Web at http://www.tartaglialab.com/cs_multi/submission and http://www.tartaglialab.com/GO_analyser/universal . Each of the pages contains links to the corresponding documentation and tutorial.The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013), through the European Research Council, under grant agreement RIBOMYLOME_309545 (Gian Gaetano Tartaglia), and from the Spanish Ministry of Economy and Competitiveness (BFU2014-55054-P). We also acknowledge support from AGAUR (2014 SGR 00685), the Spanish Ministry of Economy and Competitiveness, ‘Centro de Excelencia Severo Ochoa 2013–2017’ (SEV-2012-0208). PK and RDP are recipients of “La Caixa” and “Severo Ochoa” studentships, respectivel

    catRAPID omics: a web server for large-scale prediction of protein-RNA interactions

    No full text
    SUMMARY: Here we introduce catRAPID omics, a server for large-scale calculations of protein-RNA interactions. Our web server allows (i) predictions at proteomic and transcriptomic level; (ii) use of protein and RNA sequences without size restriction; (iii) analysis of nucleic acid binding regions in proteins; and (iv) detection of RNA motifs involved in protein recognition. RESULTS: We developed a web server to allow fast calculation of ribonucleoprotein associations in Caenorhabditis elegans, Danio rerio, Drosophila melanogaster, Homo sapiens, Mus musculus, Rattus norvegicus, Saccharomyces cerevisiae and Xenopus tropicalis (custom libraries can be also generated). The catRAPID omics was benchmarked on the recently published RNA interactomes of Serine/arginine-rich splicing factor 1 (SRSF1), Histone-lysine N-methyltransferase EZH2 (EZH2), TAR DNA-binding protein 43 (TDP43) and RNA-binding protein FUS (FUS) as well as on the protein interactomes of U1/U2 small nucleolar RNAs, X inactive specific transcript (Xist) repeat A region (RepA) and Crumbs homolog 3 (CRB3) 3'-untranslated region RNAs. Our predictions are highly significant (P < 0.05) and will help the experimentalist to identify candidates for further validation./nAVAILABILITY: catRAPID omics can be freely accessed on the Web at http://s.tartaglialab.com/catrapid/omics. Documentation, tutorial and FAQs are available at http://s.tartaglialab.com/page/catrapid_group.Funding: Spanish Ministry of Economy and Competitiveness (SAF2011-26211), the European Research Council (ERC Starting Grant to G.G.T) and the RTTIC project (to A.Z.). ‘La Caixa’ fellowship (to P.K.). Programa de Ayudas FPI del Ministerio de Economia y Competitividad—BES-2012-052457 (to D.M.

    catRAPID signature: identification of ribonucleoproteins and RNA-binding regions

    No full text
    MOTIVATION: Recent technological advances revealed that an unexpected large number of proteins interact with transcripts even if the RNA-binding domains are not annotated. We introduce catRAPID signature to identify ribonucleoproteins based on physico-chemical features instead of sequence similarity searches. The algorithm, trained on human proteins and tested on model organisms, calculates the overall RNA-binding propensity followed by the prediction of RNA-binding regions. catRAPID signature outperforms other algorithms in the identification of RNA-binding proteins and detection of non-classical RNA-binding regions. Results are visualized on a webpage and can be downloaded or forwarded to catRAPID omics for predictions of RNA targets. AVAILABILITY AND IMPLEMENTATION: catRAPID signature can be accessed at http://s.tartaglialab.com/new_submission/signatureThe research leading to these results has received funding from the European Research Council under the European Union’s Seventh Framework Programme (FP7/2007-2013)/ERC grant agreement nº RIBOMYLOME_309545. We acknowledge support of the Spanish Ministry of Economy and Competitiveness, ‘Centro de Excelencia Severo Ochoa 2013-2017’, SEV-2012-0208 and FEDER funds (European Regional Development Fund) under the project number BFU2014-55054-P. R. Delli Ponti is supported by the MINECO’s pre-doctoral grant Severo Ochoa 2013–2017 (SVP-2014-068402)

    Neurodegenerative diseases: quantitative predictions of protein-RNA interactions

    No full text
    Increasing evidence indicates that RNA plays an active role in a number of neurodegenerative diseases. We recently introduced a theoretical framework, catRAPID, to predict the binding ability of protein and RNA molecules. Here, we use catRAPID to investigate ribonucleoprotein interactions linked to inherited intellectual disability, amyotrophic lateral sclerosis, Creutzfeuld-Jakob, Alzheimer's, and Parkinson's diseases. We specifically focus on (1) RNA interactions with fragile X mental retardation protein FMRP; (2) protein sequestration caused by CGG repeats; (3) noncoding transcripts regulated by TAR DNA-binding protein 43 TDP-43; (4) autogenous regulation of TDP-43 and FMRP; (5) iron-mediated expression of amyloid precursor protein APP and α-synuclein; (6) interactions between prions and RNA aptamers. Our results are in striking agreement with experimental evidence and provide new insights in processes associated with neuronal function and misfunction.This work was supported by the Spanish Ministry of Economy and Competitiveness (SAF2011-26211), the Programa de Ayudas FPI del Ministerio de Economia y Competitividad—BES-2012-052457 and by a grant from “la Caixa” to Petr Klu

    Neurodegenerative diseases: quantitative predictions of protein-RNA interactions

    No full text
    Increasing evidence indicates that RNA plays an active role in a number of neurodegenerative diseases. We recently introduced a theoretical framework, catRAPID, to predict the binding ability of protein and RNA molecules. Here, we use catRAPID to investigate ribonucleoprotein interactions linked to inherited intellectual disability, amyotrophic lateral sclerosis, Creutzfeuld-Jakob, Alzheimer's, and Parkinson's diseases. We specifically focus on (1) RNA interactions with fragile X mental retardation protein FMRP; (2) protein sequestration caused by CGG repeats; (3) noncoding transcripts regulated by TAR DNA-binding protein 43 TDP-43; (4) autogenous regulation of TDP-43 and FMRP; (5) iron-mediated expression of amyloid precursor protein APP and α-synuclein; (6) interactions between prions and RNA aptamers. Our results are in striking agreement with experimental evidence and provide new insights in processes associated with neuronal function and misfunction.This work was supported by the Spanish Ministry of Economy and Competitiveness (SAF2011-26211), the Programa de Ayudas FPI del Ministerio de Economia y Competitividad—BES-2012-052457 and by a grant from “la Caixa” to Petr Klu

    Non-random distribution of homo-repeats: links with biological functions and human diseases

    No full text
    The biological function of multiple repetitions of single amino acids, or homo-repeats, is largely unknown, but their occurrence in proteins has been associated with more than 20 hereditary diseases. Analysing 122 bacterial and eukaryotic genomes, we observed that the number of proteins containing homo-repeats is significantly larger than expected from theoretical estimates. Analysis of statistical significance indicates that the minimal size of homo-repeats varies with amino acid type and proteome. In an attempt to characterize proteins harbouring long homo-repeats, we found that those containing polar or small amino acids S, P, H, E, D, K, Q and N are enriched in structural disorder as well as protein- and RNA-interactions. We observed that E, S, Q, G, L, P, D, A and H homo-repeats are strongly linked with occurrence in human diseases. Moreover, S, E, P, A, Q, D and T homo-repeats are significantly enriched in neuronal proteins associated with autism and other disorders. We release a webserver for further exploration of homo-repeats occurrence in human pathology at http://bioinfo.protres.ru/hradis/.This study was supported by the Russian Science Foundation grant number 14-14-00536 for OVG and the programs “Molecular and Cellular Biology” (01201353567) for MYL and IVS. GGT received funding from the European Union Seventh Framework Programme (FP7/2007–2013), through the European Research Council, under grant agreement RIBOMYLOME_309545, and from the Spanish Ministry of Economy and Competitiveness. PK and GGT also acknowledge support from the Spanish Ministry of Economy and Competitiveness, ‘Centro de Excelencia Severo Ochoa 2013–2017’ (SEV-2012-0208). PK is recipient of a “La Caixa” studentship
    corecore