13 research outputs found

    Integration of CLIP experiments of RNAbinding proteins: a novel approach to predict context-dependent splicing factors from transcriptomic data

    Get PDF
    Background: Splicing is a genetic process that has important implications in several diseases including cancer. Deciphering the complex rules of splicing regulation is crucial to understand and treat splicing-related diseases. Splicing factors and other RNA-binding proteins (RBPs) play a key role in the regulation of splicing. The specific binding sites of an RBP can be measured using CLIP experiments. However, to unveil which RBPs regulate a condition, it is necessary to have a priori hypotheses, as a single CLIP experiment targets a single protein. Results: In this work, we present a novel methodology to predict context-specific splicing factors from transcriptomic data. For this, we systematically collect, integrate and analyze more than 900 CLIP experiments stored in four CLIP databases: POSTAR2, CLIPdb, DoRiNA and StarBase. The analysis of these experiments shows the strong coherence between the binding sites of RBPs of similar families. Augmenting this information with expression changes, we are able to correctly predict the splicing factors that regulate splicing in two gold-standard experiments in which specific splicing factors are knocked-down. Conclusions: The methodology presented in this study allows the prediction of active splicing factors in either cancer or any other condition by only using the information of transcript expression. This approach opens a wide range of possible studies to understand the splicing regulation of different conditions. A tutorial with the source code and databases is available at https://gitlab.com/fcarazo.m/sfprediction

    TranscriptAchilles: a genome-wide platform to predict isoform biomarkers of gene essentiality in cancer

    Get PDF
    Background Aberrant alternative splicing plays a key role in cancer development. In recent years, alternative splicing has been used as a prognosis biomarker, a therapy response biomarker, and even as a therapeutic target. Next-generation RNA sequencing has an unprecedented potential to measure the transcriptome. However, due to the complexity of dealing with isoforms, the scientific community has not sufficiently exploited this valuable resource in precision medicine. Findings We present TranscriptAchilles, the first large-scale tool to predict transcript biomarkers associated with gene essentiality in cancer. This application integrates 412 loss-of-function RNA interference screens of >17,000 genes, together with their corresponding whole-transcriptome expression profiling. Using this tool, we have studied which are the cancer subtypes for which alternative splicing plays a significant role to state gene essentiality. In addition, we include a case study of renal cell carcinoma that shows the biological soundness of the results. The databases, the source code, and a guide to build the platform within a Docker container are available at GitLab. The application is also available online. Conclusions TranscriptAchilles provides a user-friendly web interface to identify transcript or gene biomarkers of gene essentiality, which could be used as a starting point for a drug development project. This approach opens a wide range of translational applications in cancer

    ISOGO: Functional annotation of protein-coding splice variants

    Get PDF
    The advent of RNA-seq technologies has switched the paradigm of genetic analysis from a genome to a transcriptome-based perspective. Alternative splicing generates functional diversity in genes, but the precise functions of many individual isoforms are yet to be elucidated. Gene Ontology was developed to annotate gene products according to their biological processes, molecular functions and cellular components. Despite a single gene may have several gene products, most annotations are not isoform-specifc and do not distinguish the functions of the diferent proteins originated from a single gene. Several approaches have tried to automatically annotate ontologies at the isoform level, but this has shown to be a daunting task. We have developed ISOGO (ISOform+GO function imputation), a novel algorithm to predict the function of coding isoforms based on their protein domains and their correlation of expression along 11,373 cancer patients. Combining these two sources of information outperforms previous approaches: it provides an area under precision-recall curve (AUPRC) fve times larger than previous attempts and the median AUROC of assigned functions to genes is 0.82. We tested ISOGO predictions on some genes with isoform-specifc functions (BRCA1, MADD,VAMP7 and ITSN1) and they were coherent with the literature. Besides, we examined whether the main isoform of each gene -as predicted by APPRIS- was the most likely to have the annotated gene functions and it occurs in 99.4% of the genes. We also evaluated the predictions for isoform-specifc functions provided by the CAFA3 challenge and results were also convincing. To make these results available to the scientifc community, we have deployed a web application to consult ISOGO predictions (https://biotecnun.unav. es/app/isogo). Initial data, website link, isoform-specifc GO function predictions and R code is available at https://gitlab.com/icassol/isogo

    EventPointer 3.0: flexible and accurate splicing analysis that includes studying the differential usage of protein-domains

    Get PDF
    Alternative splicing (AS) plays a key role in cancer: all its hallmarks have been associated with different mechanisms of abnormal AS. The improvement of the human transcriptome annotation and the availability of fast and accurate software to estimate isoform concentrations has boosted the analysis of transcriptome profiling from RNA-seq. The statistical analysis of AS is a challenging problem not yet fully solved. We have included in EventPointer (EP), a Bioconductor package, a novel statistical method that can use the bootstrap of the pseudoaligners. We compared it with other state-of-the-art algorithms to analyze AS. Its performance is outstanding for shallow sequencing conditions. The statistical framework is very flexible since it is based on design and contrast matrices. EP now includes a convenient tool to find the primers to validate the discoveries using PCR. We also added a statistical module to study alteration in protein domain related to AS. Applying it to 9514 patients from TCGA and TARGET in 19 different tumor types resulted in two conclusions: i) aberrant alternative splicing alters the relative presence of Protein domains and, ii) the number of enriched domains is strongly correlated with the age of the patients

    Análisis de las herrramientas de procesamiento de lenguaje natural para estructurar textos médicos.

    No full text
    La escasez de información estructurada en el campo de la medicina a lo largo de los años imposibilita la aplicación en este área de nuevas tecnologías de Inteligencia Artificial relacionadas con el análisis de datos . Nuevas aplicaciones de PLN han sido creadas con el objetivo de procesar textos médicos de manera automática y aumentar así la cantidad de datos estructurados. Este proyecto trata de analizar las herramientas disponibles y proponer una solución a la estructuración de textos médicos según las necesidades de la empresa Naru Intelligence. Tras analizar y comparar las herramientas de estructuración de textos médicos existentes a día de hoy se ha observado que muchas de estas herramientas no son de calidad o no son capaces de estructurar cualquier texto. Amazon Comprehend Medical (servicio de AWS) resulta una opción interesante para estructurar textos médicos, siendo su principal limitación el idioma soportado (Inglés)

    Análisis de las herrramientas de procesamiento de lenguaje natural para estructurar textos médicos.

    No full text
    La escasez de información estructurada en el campo de la medicina a lo largo de los años imposibilita la aplicación en este área de nuevas tecnologías de Inteligencia Artificial relacionadas con el análisis de datos . Nuevas aplicaciones de PLN han sido creadas con el objetivo de procesar textos médicos de manera automática y aumentar así la cantidad de datos estructurados. Este proyecto trata de analizar las herramientas disponibles y proponer una solución a la estructuración de textos médicos según las necesidades de la empresa Naru Intelligence. Tras analizar y comparar las herramientas de estructuración de textos médicos existentes a día de hoy se ha observado que muchas de estas herramientas no son de calidad o no son capaces de estructurar cualquier texto. Amazon Comprehend Medical (servicio de AWS) resulta una opción interesante para estructurar textos médicos, siendo su principal limitación el idioma soportado (Inglés)

    Interpretable precision medicine for acute myeloid leukemia

    Get PDF
    Precision medicine (PM) is a branch of medicine that defines a disease at a higher resolution using genetic and other technologies to enable more specific targeting of its subgroups. Because of its uses in clinical treatment and diagnostics, this field exemplifies the modern era of medicine. PM looks for not just the right drug, but also the right dosage and treatment regimen. PM encounters a variety of challenges, which will be explored in this dissertation. Large-scale sensitivity screens and whole-exome sequencing experiments (WES) have fostered a new wave of targeted treatments based on finding associations between drug sensitivity and response biomarkers. These experiments with the aid of state-of-the-art artificial intelligence (AI) algorithms are opening new therapeutic opportunities for diseases with unmet clinical needs. It has been proved that AI is capable of predicting novel personalized treatments based on complex genotypic and phenotypic patterns in tumors. The scientific community should make an effort to make these algorithms to be interpretable to humans so that the results could be easily approved by the medical regulators. The purpose of this thesis is to apply AI algorithms for precision oncology that are highly accurate, while guaranteeing that the predictions are interpretable by humans. This work is divided in three main sections. The first section comprises a new methodology to increase the predictive power of the discovery of novel treatments in large-scale screenings by exploiting that some biomarkers tend to appear in many treatments. This fact is called hub effect in gene essentiality (HUGE). Content of this section was published in [1]. The second section contains a novel interpretable AI method -called multi-dimensional module optimization (MOM)- that associates drug screening with genetic events and proposes a treatment guideline. Content of this section was published in [2]. Finally, the third section includes a detailed comparison of different recently published algorithms that attempt to overcome the barriers proposed by today's precision medicine. This study also includes two novel algorithms specifically designed to solve the challenges of applicability to clinical practice: Optimal Decision Tree (ODT) and Multinomial Lasso. The characterization of Interpretable Artificial Intelligence as approach with strong potential for use in clinical practice is one of the study's most significant achievements. We presen tunique methods for PM that are highly interpretable, and we summarize the needs that could be considered for constructing interpretable AI. We are confident that this method will transform the way PM is addressed, bridging the gap between AI and clinical practice

    Integration of CLIP experiments of RNAbinding proteins: a novel approach to predict context-dependent splicing factors from transcriptomic data

    No full text
    Background: Splicing is a genetic process that has important implications in several diseases including cancer. Deciphering the complex rules of splicing regulation is crucial to understand and treat splicing-related diseases. Splicing factors and other RNA-binding proteins (RBPs) play a key role in the regulation of splicing. The specific binding sites of an RBP can be measured using CLIP experiments. However, to unveil which RBPs regulate a condition, it is necessary to have a priori hypotheses, as a single CLIP experiment targets a single protein. Results: In this work, we present a novel methodology to predict context-specific splicing factors from transcriptomic data. For this, we systematically collect, integrate and analyze more than 900 CLIP experiments stored in four CLIP databases: POSTAR2, CLIPdb, DoRiNA and StarBase. The analysis of these experiments shows the strong coherence between the binding sites of RBPs of similar families. Augmenting this information with expression changes, we are able to correctly predict the splicing factors that regulate splicing in two gold-standard experiments in which specific splicing factors are knocked-down. Conclusions: The methodology presented in this study allows the prediction of active splicing factors in either cancer or any other condition by only using the information of transcript expression. This approach opens a wide range of possible studies to understand the splicing regulation of different conditions. A tutorial with the source code and databases is available at https://gitlab.com/fcarazo.m/sfprediction

    TranscriptAchilles: a genome-wide platform to predict isoform biomarkers of gene essentiality in cancer

    No full text
    Background Aberrant alternative splicing plays a key role in cancer development. In recent years, alternative splicing has been used as a prognosis biomarker, a therapy response biomarker, and even as a therapeutic target. Next-generation RNA sequencing has an unprecedented potential to measure the transcriptome. However, due to the complexity of dealing with isoforms, the scientific community has not sufficiently exploited this valuable resource in precision medicine. Findings We present TranscriptAchilles, the first large-scale tool to predict transcript biomarkers associated with gene essentiality in cancer. This application integrates 412 loss-of-function RNA interference screens of >17,000 genes, together with their corresponding whole-transcriptome expression profiling. Using this tool, we have studied which are the cancer subtypes for which alternative splicing plays a significant role to state gene essentiality. In addition, we include a case study of renal cell carcinoma that shows the biological soundness of the results. The databases, the source code, and a guide to build the platform within a Docker container are available at GitLab. The application is also available online. Conclusions TranscriptAchilles provides a user-friendly web interface to identify transcript or gene biomarkers of gene essentiality, which could be used as a starting point for a drug development project. This approach opens a wide range of translational applications in cancer

    ISOGO: Functional annotation of protein-coding splice variants

    No full text
    The advent of RNA-seq technologies has switched the paradigm of genetic analysis from a genome to a transcriptome-based perspective. Alternative splicing generates functional diversity in genes, but the precise functions of many individual isoforms are yet to be elucidated. Gene Ontology was developed to annotate gene products according to their biological processes, molecular functions and cellular components. Despite a single gene may have several gene products, most annotations are not isoform-specifc and do not distinguish the functions of the diferent proteins originated from a single gene. Several approaches have tried to automatically annotate ontologies at the isoform level, but this has shown to be a daunting task. We have developed ISOGO (ISOform+GO function imputation), a novel algorithm to predict the function of coding isoforms based on their protein domains and their correlation of expression along 11,373 cancer patients. Combining these two sources of information outperforms previous approaches: it provides an area under precision-recall curve (AUPRC) fve times larger than previous attempts and the median AUROC of assigned functions to genes is 0.82. We tested ISOGO predictions on some genes with isoform-specifc functions (BRCA1, MADD,VAMP7 and ITSN1) and they were coherent with the literature. Besides, we examined whether the main isoform of each gene -as predicted by APPRIS- was the most likely to have the annotated gene functions and it occurs in 99.4% of the genes. We also evaluated the predictions for isoform-specifc functions provided by the CAFA3 challenge and results were also convincing. To make these results available to the scientifc community, we have deployed a web application to consult ISOGO predictions (https://biotecnun.unav. es/app/isogo). Initial data, website link, isoform-specifc GO function predictions and R code is available at https://gitlab.com/icassol/isogo
    corecore