5 research outputs found

    CpG-Islands as Markers for Liquid Biopsies of Cancer Patients

    No full text
    The analysis of tumours using biomarkers in blood is transforming cancer diagnosis and therapy. Cancers are characterised by evolving genetic alterations, making it difficult to develop reliable and broadly applicable DNA-based biomarkers for liquid biopsy. In contrast to the variability in gene mutations, the methylation pattern remains generally constant during carcinogenesis. Thus, methylation more than mutation analysis may be exploited to recognise tumour features in the blood of patients. In this work, we investigated the possibility of using global CpG (CpG means a CG motif in the context of methylation. The p represents the phosphate. This is used to distinguish CG sites meant for methylation from other CG motifs or from mentions of CG content) island methylation profiles as a basis for the prediction of cancer state of patients utilising liquid biopsy samples. We retrieved existing GEO methylation datasets on hepatocellular carcinoma (HCC) and cell-free DNA (cfDNA) from HCC patients and healthy donors, as well as healthy whole blood and purified peripheral blood mononuclear cell (PBMC) samples, and used a random forest classifier as a predictor. Additionally, we tested three different feature selection techniques in combination. When using cfDNA samples together with solid tumour samples and healthy blood samples of different origin, we could achieve an average accuracy of 0.98 in a 10-fold cross-validation. In this setting, all the feature selection methods we tested in this work showed promising results. We could also show that it is possible to use solid tumour samples and purified PBMCs as a training set and correctly predict a cfDNA sample as cancerous or healthy. In contrast to the complete set of samples, the feature selections led to varying results of the respective random forests. ANOVA feature selection worked well with this training set, and the selected features allowed the random forest to predict all cfDNA samples correctly. Feature selection based on mutual information could also lead to better than random results, but LASSO feature selection would not lead to a confident prediction. Our results show the relevance of CpG islands as tumour markers in blood

    Batch effect detection and correction in RNA-seq data using machine-learning-based automated assessment of quality

    No full text
    Abstract Background The constant evolving and development of next-generation sequencing techniques lead to high throughput data composed of datasets that include a large number of biological samples. Although a large number of samples are usually experimentally processed by batches, scientific publications are often elusive about this information, which can greatly impact the quality of the samples and confound further statistical analyzes. Because dedicated bioinformatics methods developed to detect unwanted sources of variance in the data can wrongly detect real biological signals, such methods could benefit from using a quality-aware approach. Results We recently developed statistical guidelines and a machine learning tool to automatically evaluate the quality of a next-generation-sequencing sample. We leveraged this quality assessment to detect and correct batch effects in 12 publicly available RNA-seq datasets with available batch information. We were able to distinguish batches by our quality score and used it to correct for some batch effects in sample clustering. Overall, the correction was evaluated as comparable to or better than the reference method that uses a priori knowledge of the batches (in 10 and 1 datasets of 12, respectively; total = 92%). When coupled to outlier removal, the correction was more often evaluated as better than the reference (comparable or better in 5 and 6 datasets of 12, respectively; total = 92%). Conclusions In this work, we show the capabilities of our software to detect batches in public RNA-seq datasets from differences in the predicted quality of their samples. We also use these insights to correct the batch effect and observe the relation of sample quality and batch effect. These observations reinforce our expectation that while batch effects do correlate with differences in quality, batch effects also arise from other artifacts and are more suitably  corrected statistically in well-designed experiments

    El análisis de pangenomas bacterianos ilumina la materia oscura de los sistemas CRISPR-Cas

    No full text
    Resumen del trabajo presentado en la XIII Reunión del grupo especializado en Microbiología Molecular de la SEM, celebrada en Granada (España) del 07 al 09 de septiembre de 2022.Los sistemas CRISPR-Cas son módulos de inmunidad adquiridos por transferencia horizontal en bacterias. Están compuestos por genes denominados de forma genérica como cas y por una serie de secuencias llamadas espaciadores, provenientes de entradas previas de ADN exógeno, principalmente elementos genéticos móviles como fagos y plásmidos. En 2017 se estableció lo que se denominó la materia oscura de los CRISPR, en referencia al origen desconocido del 90% de los espaciadores, los cuales no se parecen a ninguna secuencia conocida. La era genómica permite ahora analizar miles de genomas bacterianos, construyendo pangenomas útiles para estudiar el genoma accesorio de una especie dada. Nuestro grupo ha analizado computacionalmente cerca de 70.000 genomas de seis especies bacterianas pertenecientes al grupo ESKAPE, de interés en clínica. Realizando pangenomas de estas especies, ha encontrado que las 2 especies grampositivas de este grupo (Enterococcus faecium y Staphylococcus aureus) presentan sistemas CRISPR-Cas en sólo el 1% de sus genomas, mientras que en las gramnegativas este número se acerca al 50% en algunas de las especies (Klebsiella pneumoniae, Acinetobacter baumannii, Pseudomonas aeruginosa y Enterobacter cloacae). Utilizando machine learning, hemos buscado los genes que aparecen frecuentemente asociados a los sistemas CRISPR-Cas, encontrando que un significativo número de ellos codifican para proteínas de membrana. Estas proteínas forman parte, por ejemplo, de sistemas de resistencia a diferentes tipos de estrés. Además, algunas de ellas son conocidas por actuar como receptores de bacteriófagos, por lo que hipotetizamos que las bacterias que los presentan podrían necesitar sistemas CRISPR-Cas para controlar la infección del fago. Esta hipótesis fue validada al estudiar los profagos y espaciadores presentes en estas cepas bacterianas. Adicionalmente, gracias a la gran cantidad de genomas analizados, pudimos reducir drásticamente la materia oscura de los CRISPR del 90% a casi un 20% en tres de las especies analizadas, lo que demuestra que el estudio de pangenomas permite un mayor conocimiento de sus sistemas CRISPR-Cas. En la actualidad, estamos analizando en detalle las secuencias reconocidas por los espaciadores de los diferentes tipos de sistemas CRISPR-Cas, habiendo encontrado una interesante relación con los llamados fago-plásmidos, fagos que bajo determinadas circunstancias permanecen como elementos extracromosómicos.Ministerio de Ciencia e Innovación (PID2020-114861GB-I00) y Fondo Regional Europeo y Conserjería de Transformación Económica, Industria, Conocimiento y Universidades de la Junta de Andalucía (PY20_00871)

    Analysis of bacterial pangenomes reduces CRISPR dark matter and reveals strong association between membranome and CRISPR-Cas systems

    No full text
    CRISPR-Cas systems are prokaryotic acquired immunity mechanisms, which are found in 40% of bacterial genomes. They prevent viral infections through small DNA fragments called spacers. However, the vast majority of these spacers have not yet been associated with the virus they recognize, and it has been named CRISPR dark matter. By analyzing the spacers of tens of thousands of genomes from six bacterial species, we have been able to reduce the CRISPR dark matter from 80% to as low as 15% in some of the species. In addition, we have observed that, when a genome presents CRISPR-Cas systems, this is accompanied by particular sets of membrane proteins. Our results suggest that when bacteria present membrane proteins that make it compete better in its environment and these proteins are, in turn, receptors for specific phages, they would be forced to acquire CRISPR-Cas.This work was supported by MCIN/AEI/ PID2020-114861GB-I00 (Agencia Estatal de Investigación/Ministry of Science and Innovation of the Spanish Government) and by the European Regional Development Fund and the Consejeria de Transformación Económica, Industria, Conocimiento y Universidades de la Junta de Andalucia (PY20_00871)

    Analysis of RBP expression and binding sites identifies PTBP1 as a regulator of CD19 expression in B-ALL

    No full text
    ABSTRACTDespite massive improvements in the treatment of B-ALL through CART-19 immunotherapy, a large number of patients suffer a relapse due to loss of the targeted epitope. Mutations in the CD19 locus and aberrant splicing events are known to account for the absence of surface antigen. However, early molecular determinants suggesting therapy resistance as well as the time point when first signs of epitope loss appear to be detectable are not enlightened so far. By deep sequencing of the CD19 locus, we identified a blast-specific 2-nucleotide deletion in intron 2 that exists in 35% of B-ALL samples at initial diagnosis. This deletion overlaps with the binding site of RNA binding proteins (RBPs) including PTBP1 and might thereby affect CD19 splicing. Moreover, we could identify a number of other RBPs that are predicted to bind to the CD19 locus being deregulated in leukemic blasts, including NONO. Their expression is highly heterogeneous across B-ALL molecular subtypes as shown by analyzing 706 B-ALL samples accessed via the St. Jude Cloud. Mechanistically, we show that downregulation of PTBP1, but not of NONO, in 697 cells reduces CD19 total protein by increasing intron 2 retention. Isoform analysis in patient samples revealed that blasts, at diagnosis, express increased amounts of CD19 intron 2 retention compared to normal B cells. Our data suggest that loss of RBP functionality by mutations altering their binding motifs or by deregulated expression might harbor the potential for the disease-associated accumulation of therapy-resistant CD19 isoforms
    corecore