13 research outputs found
Integration of CLIP experiments of RNAbinding proteins: a novel approach to predict context-dependent splicing factors from transcriptomic data
Background: Splicing is a genetic process that has important implications in several diseases including cancer.
Deciphering the complex rules of splicing regulation is crucial to understand and treat splicing-related diseases. Splicing
factors and other RNA-binding proteins (RBPs) play a key role in the regulation of splicing. The specific binding sites of an
RBP can be measured using CLIP experiments. However, to unveil which RBPs regulate a condition, it is necessary to have
a priori hypotheses, as a single CLIP experiment targets a single protein.
Results: In this work, we present a novel methodology to predict context-specific splicing factors from transcriptomic
data. For this, we systematically collect, integrate and analyze more than 900 CLIP experiments stored in four CLIP
databases: POSTAR2, CLIPdb, DoRiNA and StarBase. The analysis of these experiments shows the strong coherence
between the binding sites of RBPs of similar families. Augmenting this information with expression changes, we are
able to correctly predict the splicing factors that regulate splicing in two gold-standard experiments in which specific
splicing factors are knocked-down.
Conclusions: The methodology presented in this study allows the prediction of active splicing factors in either cancer
or any other condition by only using the information of transcript expression. This approach opens a wide range of
possible studies to understand the splicing regulation of different conditions. A tutorial with the source code and
databases is available at https://gitlab.com/fcarazo.m/sfprediction
TranscriptAchilles: a genome-wide platform to predict isoform biomarkers of gene essentiality in cancer
Background
Aberrant alternative splicing plays a key role in cancer development. In recent years, alternative splicing has been used as a prognosis biomarker, a therapy response biomarker, and even as a therapeutic target. Next-generation RNA sequencing has an unprecedented potential to measure the transcriptome. However, due to the complexity of dealing with isoforms, the scientific community has not sufficiently exploited this valuable resource in precision medicine.
Findings
We present TranscriptAchilles, the first large-scale tool to predict transcript biomarkers associated with gene essentiality in cancer. This application integrates 412 loss-of-function RNA interference screens of >17,000 genes, together with their corresponding whole-transcriptome expression profiling. Using this tool, we have studied which are the cancer subtypes for which alternative splicing plays a significant role to state gene essentiality. In addition, we include a case study of renal cell carcinoma that shows the biological soundness of the results. The databases, the source code, and a guide to build the platform within a Docker container are available at GitLab. The application is also available online.
Conclusions
TranscriptAchilles provides a user-friendly web interface to identify transcript or gene biomarkers of gene essentiality, which could be used as a starting point for a drug development project. This approach opens a wide range of translational applications in cancer
ISOGO: Functional annotation of protein-coding splice variants
The advent of RNA-seq technologies has switched the paradigm of genetic analysis from a genome
to a transcriptome-based perspective. Alternative splicing generates functional diversity in genes,
but the precise functions of many individual isoforms are yet to be elucidated. Gene Ontology was
developed to annotate gene products according to their biological processes, molecular functions and
cellular components. Despite a single gene may have several gene products, most annotations are not
isoform-specifc and do not distinguish the functions of the diferent proteins originated from a single
gene. Several approaches have tried to automatically annotate ontologies at the isoform level, but
this has shown to be a daunting task. We have developed ISOGO (ISOform+GO function imputation),
a novel algorithm to predict the function of coding isoforms based on their protein domains and their
correlation of expression along 11,373 cancer patients. Combining these two sources of information
outperforms previous approaches: it provides an area under precision-recall curve (AUPRC) fve times
larger than previous attempts and the median AUROC of assigned functions to genes is 0.82. We tested
ISOGO predictions on some genes with isoform-specifc functions (BRCA1, MADD,VAMP7 and ITSN1)
and they were coherent with the literature. Besides, we examined whether the main isoform of each
gene -as predicted by APPRIS- was the most likely to have the annotated gene functions and it occurs
in 99.4% of the genes. We also evaluated the predictions for isoform-specifc functions provided by
the CAFA3 challenge and results were also convincing. To make these results available to the scientifc
community, we have deployed a web application to consult ISOGO predictions (https://biotecnun.unav.
es/app/isogo). Initial data, website link, isoform-specifc GO function predictions and R code is available
at https://gitlab.com/icassol/isogo
EventPointer 3.0: flexible and accurate splicing analysis that includes studying the differential usage of protein-domains
Alternative splicing (AS) plays a key role in cancer: all its hallmarks have been associated with different mechanisms of abnormal AS. The improvement of the human transcriptome annotation and the availability of fast and accurate software to estimate isoform concentrations has boosted the analysis of transcriptome profiling from RNA-seq. The statistical analysis of AS is a challenging problem not yet fully solved. We have included in EventPointer (EP), a Bioconductor package, a novel statistical method that can use the bootstrap of the pseudoaligners. We compared it with other state-of-the-art algorithms to analyze AS. Its performance is outstanding for shallow sequencing conditions. The statistical framework is very flexible since it is based on design and contrast matrices. EP now includes a convenient tool to find the primers to validate the discoveries using PCR. We also added a statistical module to study alteration in protein domain related to AS. Applying it to 9514 patients from TCGA and TARGET in 19 different tumor types resulted in two conclusions: i) aberrant alternative splicing alters the relative presence of Protein domains and, ii) the number of enriched domains is strongly correlated with the age of the patients
Análisis de las herrramientas de procesamiento de lenguaje natural para estructurar textos médicos.
La escasez de información estructurada en el campo de la medicina a lo largo de los años imposibilita la aplicación en este área de nuevas tecnologías de Inteligencia Artificial relacionadas con el análisis de datos . Nuevas aplicaciones de PLN han sido creadas con el objetivo de procesar textos médicos de manera automática y aumentar así la cantidad de datos estructurados. Este proyecto trata de analizar las herramientas disponibles y proponer una solución a la estructuración de textos médicos según las necesidades de la empresa Naru Intelligence. Tras analizar y comparar las herramientas de estructuración de textos médicos existentes a día de hoy se ha observado que muchas de estas herramientas no son de calidad o no son capaces de estructurar cualquier texto. Amazon Comprehend Medical (servicio de AWS) resulta una opción interesante para estructurar textos médicos, siendo su principal limitación el idioma soportado (Inglés)
Análisis de las herrramientas de procesamiento de lenguaje natural para estructurar textos médicos.
La escasez de información estructurada en el campo de la medicina a lo largo de los años imposibilita la aplicación en este área de nuevas tecnologías de Inteligencia Artificial relacionadas con el análisis de datos . Nuevas aplicaciones de PLN han sido creadas con el objetivo de procesar textos médicos de manera automática y aumentar así la cantidad de datos estructurados. Este proyecto trata de analizar las herramientas disponibles y proponer una solución a la estructuración de textos médicos según las necesidades de la empresa Naru Intelligence. Tras analizar y comparar las herramientas de estructuración de textos médicos existentes a día de hoy se ha observado que muchas de estas herramientas no son de calidad o no son capaces de estructurar cualquier texto. Amazon Comprehend Medical (servicio de AWS) resulta una opción interesante para estructurar textos médicos, siendo su principal limitación el idioma soportado (Inglés)
Interpretable precision medicine for acute myeloid leukemia
Precision medicine (PM) is a branch of medicine that defines a disease at a higher
resolution using genetic and other technologies to enable more specific targeting of its
subgroups. Because of its uses in clinical treatment and diagnostics, this field exemplifies
the modern era of medicine. PM looks for not just the right drug, but also the right dosage
and treatment regimen. PM encounters a variety of challenges, which will be explored in
this dissertation.
Large-scale sensitivity screens and whole-exome sequencing experiments (WES) have
fostered a new wave of targeted treatments based on finding associations between drug
sensitivity and response biomarkers. These experiments with the aid of state-of-the-art
artificial intelligence (AI) algorithms are opening new therapeutic opportunities for diseases
with unmet clinical needs. It has been proved that AI is capable of predicting novel
personalized treatments based on complex genotypic and phenotypic patterns in tumors.
The scientific community should make an effort to make these algorithms to be interpretable
to humans so that the results could be easily approved by the medical regulators. The
purpose of this thesis is to apply AI algorithms for precision oncology that are highly
accurate, while guaranteeing that the predictions are interpretable by humans.
This work is divided in three main sections. The first section comprises a new methodology
to increase the predictive power of the discovery of novel treatments in large-scale
screenings by exploiting that some biomarkers tend to appear in many treatments. This fact
is called hub effect in gene essentiality (HUGE). Content of this section was published in
[1]. The second section contains a novel interpretable AI method -called multi-dimensional
module optimization (MOM)- that associates drug screening with genetic events and
proposes a treatment guideline. Content of this section was published in [2]. Finally, the
third section includes a detailed comparison of different recently published algorithms that
attempt to overcome the barriers proposed by today's precision medicine. This study also
includes two novel algorithms specifically designed to solve the challenges of applicability
to clinical practice: Optimal Decision Tree (ODT) and Multinomial Lasso.
The characterization of Interpretable Artificial Intelligence as approach with strong potential
for use in clinical practice is one of the study's most significant achievements. We presen tunique methods for PM that are highly interpretable, and we summarize the needs that
could be considered for constructing interpretable AI. We are confident that this method will
transform the way PM is addressed, bridging the gap between AI and clinical practice
Integration of CLIP experiments of RNAbinding proteins: a novel approach to predict context-dependent splicing factors from transcriptomic data
Background: Splicing is a genetic process that has important implications in several diseases including cancer.
Deciphering the complex rules of splicing regulation is crucial to understand and treat splicing-related diseases. Splicing
factors and other RNA-binding proteins (RBPs) play a key role in the regulation of splicing. The specific binding sites of an
RBP can be measured using CLIP experiments. However, to unveil which RBPs regulate a condition, it is necessary to have
a priori hypotheses, as a single CLIP experiment targets a single protein.
Results: In this work, we present a novel methodology to predict context-specific splicing factors from transcriptomic
data. For this, we systematically collect, integrate and analyze more than 900 CLIP experiments stored in four CLIP
databases: POSTAR2, CLIPdb, DoRiNA and StarBase. The analysis of these experiments shows the strong coherence
between the binding sites of RBPs of similar families. Augmenting this information with expression changes, we are
able to correctly predict the splicing factors that regulate splicing in two gold-standard experiments in which specific
splicing factors are knocked-down.
Conclusions: The methodology presented in this study allows the prediction of active splicing factors in either cancer
or any other condition by only using the information of transcript expression. This approach opens a wide range of
possible studies to understand the splicing regulation of different conditions. A tutorial with the source code and
databases is available at https://gitlab.com/fcarazo.m/sfprediction
TranscriptAchilles: a genome-wide platform to predict isoform biomarkers of gene essentiality in cancer
Background
Aberrant alternative splicing plays a key role in cancer development. In recent years, alternative splicing has been used as a prognosis biomarker, a therapy response biomarker, and even as a therapeutic target. Next-generation RNA sequencing has an unprecedented potential to measure the transcriptome. However, due to the complexity of dealing with isoforms, the scientific community has not sufficiently exploited this valuable resource in precision medicine.
Findings
We present TranscriptAchilles, the first large-scale tool to predict transcript biomarkers associated with gene essentiality in cancer. This application integrates 412 loss-of-function RNA interference screens of >17,000 genes, together with their corresponding whole-transcriptome expression profiling. Using this tool, we have studied which are the cancer subtypes for which alternative splicing plays a significant role to state gene essentiality. In addition, we include a case study of renal cell carcinoma that shows the biological soundness of the results. The databases, the source code, and a guide to build the platform within a Docker container are available at GitLab. The application is also available online.
Conclusions
TranscriptAchilles provides a user-friendly web interface to identify transcript or gene biomarkers of gene essentiality, which could be used as a starting point for a drug development project. This approach opens a wide range of translational applications in cancer
ISOGO: Functional annotation of protein-coding splice variants
The advent of RNA-seq technologies has switched the paradigm of genetic analysis from a genome
to a transcriptome-based perspective. Alternative splicing generates functional diversity in genes,
but the precise functions of many individual isoforms are yet to be elucidated. Gene Ontology was
developed to annotate gene products according to their biological processes, molecular functions and
cellular components. Despite a single gene may have several gene products, most annotations are not
isoform-specifc and do not distinguish the functions of the diferent proteins originated from a single
gene. Several approaches have tried to automatically annotate ontologies at the isoform level, but
this has shown to be a daunting task. We have developed ISOGO (ISOform+GO function imputation),
a novel algorithm to predict the function of coding isoforms based on their protein domains and their
correlation of expression along 11,373 cancer patients. Combining these two sources of information
outperforms previous approaches: it provides an area under precision-recall curve (AUPRC) fve times
larger than previous attempts and the median AUROC of assigned functions to genes is 0.82. We tested
ISOGO predictions on some genes with isoform-specifc functions (BRCA1, MADD,VAMP7 and ITSN1)
and they were coherent with the literature. Besides, we examined whether the main isoform of each
gene -as predicted by APPRIS- was the most likely to have the annotated gene functions and it occurs
in 99.4% of the genes. We also evaluated the predictions for isoform-specifc functions provided by
the CAFA3 challenge and results were also convincing. To make these results available to the scientifc
community, we have deployed a web application to consult ISOGO predictions (https://biotecnun.unav.
es/app/isogo). Initial data, website link, isoform-specifc GO function predictions and R code is available
at https://gitlab.com/icassol/isogo