16 research outputs found

    The promise of explainable deep learning for omics data analysis: Adding new discovery tools to AI

    No full text
    : Deep learning has already revolutionised the way a wide range of data is processed in many areas of daily life. The ability to learn abstractions and relationships from heterogeneous data has provided impressively accurate prediction and classification tools to handle increasingly big datasets. This has a significant impact on the growing wealth of omics datasets, with the unprecedented opportunity for a better understanding of the complexity of living organisms. While this revolution is transforming the way these data are analyzed, explainable deep learning is emerging as an additional tool with the potential to change the way biological data is interpreted. Explainability addresses critical issues such as transparency, so important when computational tools are introduced especially in clinical environments. Moreover, it empowers artificial intelligence with the capability to provide new insights into the input data, thus adding an element of discovery to these already powerful resources. In this review, we provide an overview of the transformative effects explainable deep learning is having on multiple sectors, ranging from genome engineering and genomics, from radiomics to drug design and clinical trials. We offer a perspective to life scientists, to better understand the potential of these tools, and a motivation to implement them in their research, by suggesting learning resources they can use to move their first steps in this field

    hgtseq: A Standard Pipeline to Study Horizontal Gene Transfer

    No full text
    Horizontal gene transfer (HGT) is well described in prokaryotes: it plays a crucial role in evolution, and has functional consequences in insects and plants. However, less is known about HGT in humans. Studies have reported bacterial integrations in cancer patients, and microbial sequences have been detected in data from well-known human sequencing projects. Few of the existing tools for investigating HGT are highly automated. Thanks to the adoption of Nextflow for life sciences workflows, and to the standards and best practices curated by communities such as nf-core, fully automated, portable, and scalable pipelines can now be developed. Here we present nf-core/hgtseq to facilitate the analysis of HGT from sequencing data in different organisms. We showcase its performance by analysing six exome datasets from five mammals. Hgtseq can be run seamlessly in any computing environment and accepts data generated by existing exome and whole-genome sequencing projects; this will enable researchers to expand their analyses into this area. Fundamental questions are still open about the mechanisms and the extent or role of horizontal gene transfer: by releasing hgtseq we provide a standardised tool which will enable a systematic investigation of this phenomenon, thus paving the way for a better understanding of HGT

    Prediction of Metabolic Profiles from Transcriptomics Data in Human Cancer Cell Lines

    No full text
    The Metabolome and Transcriptome are mutually communicating within cancer cells, and this interplay is translated into the existence of quantifiable correlation structures between gene expression and metabolite abundance levels. Studying these correlations could provide a novel venue of understanding cancer and the discovery of novel biomarkers and pharmacological strategies, as well as laying the foundation for the prediction of metabolite quantities by leveraging information from the more widespread transcriptomics data. In the current paper, we investigate the correlation between gene expression and metabolite levels in the Cancer Cell Line Encyclopedia dataset, building a direct correlation network between the two molecular ensembles. We show that a metabolite/transcript correlation network can be used to predict metabolite levels in different samples and datasets, such as the NCI-60 cancer cell line dataset, both on a sample-by-sample basis and in differential contrasts. We also show that metabolite levels can be predicted in principle on any sample and dataset for which transcriptomics data are available, such as the Cancer Genome Atlas (TCGA)

    hgtseq: A Standard Pipeline to Study Horizontal Gene Transfer

    No full text
    Horizontal gene transfer (HGT) is well described in prokaryotes: it plays a crucial role in evolution, and has functional consequences in insects and plants. However, less is known about HGT in humans. Studies have reported bacterial integrations in cancer patients, and microbial sequences have been detected in data from well-known human sequencing projects. Few of the existing tools for investigating HGT are highly automated. Thanks to the adoption of Nextflow for life sciences workflows, and to the standards and best practices curated by communities such as nf-core, fully automated, portable, and scalable pipelines can now be developed. Here we present nf-core/hgtseq to facilitate the analysis of HGT from sequencing data in different organisms. We showcase its performance by analysing six exome datasets from five mammals. Hgtseq can be run seamlessly in any computing environment and accepts data generated by existing exome and whole-genome sequencing projects; this will enable researchers to expand their analyses into this area. Fundamental questions are still open about the mechanisms and the extent or role of horizontal gene transfer: by releasing hgtseq we provide a standardised tool which will enable a systematic investigation of this phenomenon, thus paving the way for a better understanding of HGT

    A multi-parametric workflow for the prioritization of mitochondrial DNA variants of clinical interest

    No full text
    Assigning a pathogenic role to mitochondrial DNA (mtDNA) variants and unveiling the potential involvement of the mitochondrial genome in diseases are challenging tasks in human medicine. Assuming that rare variants are more likely to be damaging, we designed a phylogeny-based prioritization workflow to obtain a reliable pool of candidate variants for further investigations. The prioritization workflow relies on an exhaustive functional annotation through the mtDNA extraction pipeline MToolBox and includes Macro Haplogroup Consensus Sequences to filter out fixed evolutionary variants and report rare or private variants, the nucleotide variability as reported in HmtDB and the disease score based on several predictors of pathogenicity for non-synonymous variants. Cutoffs for both the disease score as well as for the nucleotide variability index were established with the aim to discriminate sequence variants contributing to defective phenotypes. The workflow was validated on mitochondrial sequences from Leber's Hereditary Optic Neuropathy affected individuals, successfully identifying 23 variants including the majority of the known causative ones. The application of the prioritization workflow to cancer datasets allowed to trim down the number of candidate for subsequent functional analyses, unveiling among these a high percentage of somatic variants. Prioritization criteria were implemented in both standalone ( http://sourceforge.net/projects/mtoolbox/ ) and web version ( https://mseqdr.org/mtoolbox.php ) of MToolBox

    Extraction and annotation of human mitochondrial genomes from 1000 Genomes Whole Exome Sequencing data

    No full text
    Background Whole Exome Sequencing (WES) is one of the most used and cost-effective next generation technologies that allows sequencing of all nuclear exons. Off-target regions may be captured if they present high sequence similarity with baits. Bioinformatics tools have been optimized to retrieve a large amount of WES off-target mitochondrial DNA (mtDNA), by exploiting the aspecificity of probes, partially overlapping to Nuclear mitochondrial Sequences (NumtS). The 1000 Genomes project represents one of the widest resources to extract mtDNA sequences from WES data, considering the large effort the scientific community is undertaking to reconstruct human population history using mtDNA as marker, and the involvement of mtDNA in pathology. Results A previously published pipeline aimed at assembling mitochondrial genomes from off-target WES reads and further improved to detect insertions and deletions (indels) and heteroplasmy in a dataset of 1242 samples from the 1000 Genomes project, enabled to obtain a nearly complete mitochondrial genome from 943 samples (76% analyzed exomes). The robustness of our computational strategy was highlighted by the reduction of reads amount recognized as mitochondrial in the original annotation produced by the Consortium, due to NumtS filtering. An accurate survey was carried out on 1242 individuals. 215 indels, mostly heteroplasmic, and 3407 single base variants were mapped. A homogeneous mismatches distribution was observed along the whole mitochondrial genome, while a lower frequency of indels was found within protein-coding regions, where frameshift mutations may be deleterious. The majority of indels and mismatches found were not previously annotated in mitochondrial databases since conventional sequencing methods were limited to homoplasmy or quasi-homoplasmy detection. Intriguingly, upon filtering out non haplogroup-defining variants, we detected a widespread population occurrence of rare events predicted to be damaging. Eventually, samples were stratified into blood- and lymphoblastoid-derived to detect possibly different trends of mutability in the two datasets, an analysis which did not yield significant discordances. Conclusions To the best of our knowledge, this is likely the most extended population-scale mitochondrial genotyping in humans enriched with the estimation of heteroplasmies

    MToolBox: a highly automated pipeline for heteroplasmy annotation and prioritization analysis of human mitochondrial variants in high-throughput sequencing

    No full text
    Motivation: The increasing availability of mitochondria-targeted and off-targeted sequencing data in Whole Exome and Genome Sequencing studies (WXS and WGS) has risen the demand of effective pipelines to accurately measure heteroplasmy and to easily recognize the most functionally important mitochondrial variants among a huge number of candidates. To this purpose we developed MToolBox, a highly automated pipeline to reconstruct and analyze human mitochondrial DNA from high-throughput sequencing data. Results: MToolBox implements an effective computational strategy for mitochondrial genomes assembling and haplogroup assignment also including a prioritization analysis of detected variants. MToolBox provides a Variant Call Format (VCF) file featuring, for the first time, allele-specific heteroplasmy and annotation files with prioritized variants. MToolBox was tested on simulated samples and applied on 1000 Genomes WXS datasets. Availability: MToolBox package is available at https://sourceforge.net/projects/mtoolbox/

    Breast Cancer Organoids Model Patient-Specific Response to Drug Treatment

    No full text
    Simple SummaryThe possibility to generate in the laboratory faithful models of patients' tumors is of primary importance to capture cancer complexity and study therapy response in a personalized setting. Tumor organoids are 3D cell cultures, obtained from patients' tumor tissues, that recapitulate several characteristics of the original tumor, thus representing a clinically relevant patient avatar. This study reports the generation and the molecular characterization of patient-derived organoids from invasive breast carcinomas. Our results proved the usefulness of these cancer models for designing patient-specific therapeutic approaches to treat highly aggressive cancers, but also highlighted the need to further improve this methodology to overcome its current limitations.Tumor organoids are tridimensional cell culture systems that are generated in vitro from surgically resected patients' tumors. They can be propagated in culture maintaining several features of the tumor of origin, including cellular and genetic heterogeneity, thus representing a promising tool for precision cancer medicine. Here, we established patient-derived tumor organoids (PDOs) from different breast cancer subtypes (luminal A, luminal B, human epidermal growth factor receptor 2 (HER2)-enriched, and triple negative). The established model systems showed histological and genomic concordance with parental tumors. However, in PDOs, the ratio of diverse cell populations was frequently different from that originally observed in parental tumors. We showed that tumor organoids represent a valuable system to test the efficacy of standard therapeutic treatments and to identify drug resistant populations within tumors. We also report that inhibitors of mechanosignaling and of Yes-associated protein 1 (YAP) activation can restore chemosensitivity in drug resistant tumor organoids

    HmtDB 2016: data update, a better performing query system and human mitochondrial DNA haplogroup predictor

    No full text
    The HmtDB resource hosts a database of human mitochondrial genome sequences from individuals with healthy and disease phenotypes. The database is intended to support both population geneticists as well as clinicians undertaking the task to assess the pathogenicity of specific mtDNA mutations. The wide application of next-generation sequencing (NGS) has provided an enormous volume of high-resolution data at a low price, increasing the availability of human mitochondrial sequencing data, which called for a cogent and significant expansion of HmtDB data content that has more than tripled in the current release. We here describe additional novel features, including: (i) a complete, user-friendly restyling of the web interface, (ii) links to the command-line stand-alone and web versions of the MToolBox package, an up-to-date tool to reconstruct and analyze human mitochondrial DNA from NGS data and (iii) the implementation of the Reconstructed Sapiens Reference Sequence (RSRS) as mitochondrial reference sequence. The overall update renders HmtDB an even more handy and useful resource as it enables a more rapid data access, processing and analysis. HmtDB is accessible at http://www.hmtdb.uniba.it/
    corecore