27 research outputs found

    GA4GH: International policies and standards for data sharing across genomic research and healthcare.

    Get PDF
    The Global Alliance for Genomics and Health (GA4GH) aims to accelerate biomedical advances by enabling the responsible sharing of clinical and genomic data through both harmonized data aggregation and federated approaches. The decreasing cost of genomic sequencing (along with other genome-wide molecular assays) and increasing evidence of its clinical utility will soon drive the generation of sequence data from tens of millions of humans, with increasing levels of diversity. In this perspective, we present the GA4GH strategies for addressing the major challenges of this data revolution. We describe the GA4GH organization, which is fueled by the development efforts of eight Work Streams and informed by the needs of 24 Driver Projects and other key stakeholders. We present the GA4GH suite of secure, interoperable technical standards and policy frameworks and review the current status of standards, their relevance to key domains of research and clinical care, and future plans of GA4GH. Broad international participation in building, adopting, and deploying GA4GH standards and frameworks will catalyze an unprecedented effort in data sharing that will be critical to advancing genomic medicine and ensuring that all populations can access its benefits

    Bioinformatics for personal genomics: development and application of bioinformatic procedures for the analysis of genomic data

    Get PDF
    In the last decade, the huge decreasing of sequencing cost due to the development of high-throughput technologies completely changed the way for approaching the genetic problems. In particular, whole exome and whole genome sequencing are contributing to the extraordinary progress in the study of human variants opening up new perspectives in personalized medicine. Being a relatively new and fast developing field, appropriate tools and specialized knowledge are required for an efficient data production and analysis. In line with the times, in 2014, the University of Padua funded the BioInfoGen Strategic Project with the goal of developing technology and expertise in bioinformatics and molecular biology applied to personal genomics. The aim of my PhD was to contribute to this challenge by implementing a series of innovative tools and by applying them for investigating and possibly solving the case studies included into the project. I firstly developed an automated pipeline for dealing with Illumina data, able to sequentially perform each step necessary for passing from raw reads to somatic or germline variant detection. The system performance has been tested by means of internal controls and by its application on a cohort of patients affected by gastric cancer, obtaining interesting results. Once variants are called, they have to be annotated in order to define their properties such as the position at transcript and protein level, the impact on protein sequence, the pathogenicity and more. As most of the publicly available annotators were affected by systematic errors causing a low consistency in the final annotation, I implemented VarPred, a new tool for variant annotation, which guarantees the best accuracy (>99%) compared to the state-of-the-art programs, showing also good processing times. To make easy the use of VarPred, I equipped it with an intuitive web interface, that allows not only a graphical result evaluation, but also a simple filtration strategy. Furthermore, for a valuable user-driven prioritization of human genetic variations, I developed QueryOR, a web platform suitable for searching among known candidate genes as well as for finding novel gene-disease associations. QueryOR combines several innovative features that make it comprehensive, flexible and easy to use. The prioritization is achieved by a global positive selection process that promotes the emergence of the most reliable variants, rather than filtering out those not satisfying the applied criteria. QueryOR has been used to analyze the two case studies framed within the BioInfoGen project. In particular, it allowed to detect causative variants in patients affected by lysosomal storage diseases, highlighting also the efficacy of the designed sequencing panel. On the other hand, QueryOR simplified the recognition of LRP2 gene as possible candidate to explain such subjects with a Dent disease-like phenotype, but with no mutation in the previously identified disease-associated genes, CLCN5 and OCRL. As final corollary, an extensive analysis over recurrent exome variants was performed, showing that their origin can be mainly explained by inaccuracies in the reference genome, including misassembled regions and uncorrected bases, rather than by platform specific errors

    Biological heterogeneity in idiopathic pulmonary arterial hypertension identified through unsupervised transcriptomic profiling of whole blood

    Get PDF
    Idiopathic pulmonary arterial hypertension (IPAH) is a rare but fatal disease diagnosed by right heart catheterisation and the exclusion of other forms of pulmonary arterial hypertension, producing a heterogeneous population with varied treatment response. Here we show unsupervised machine learning identification of three major patient subgroups that account for 92% of the cohort, each with unique whole blood transcriptomic and clinical feature signatures. These subgroups are associated with poor, moderate, and good prognosis. The poor prognosis subgroup is associated with upregulation of the ALAS2 and downregulation of several immunoglobulin genes, while the good prognosis subgroup is defined by upregulation of the bone morphogenetic protein signalling regulator NOG, and the C/C variant of HLA-DPA1/DPB1 (independently associated with survival). These findings independently validated provide evidence for the existence of 3 major subgroups (endophenotypes) within the IPAH classification, could improve risk stratification and provide molecular insights into the pathogenesis of IPAH

    Biological heterogeneity in idiopathic pulmonary arterial hypertension identified through unsupervised transcriptomic profiling of whole blood

    Get PDF
    Idiopathic pulmonary arterial hypertension (IPAH) is a rare but fatal disease diagnosed by right heart catheterisation and the exclusion of other forms of pulmonary arterial hypertension, producing a heterogeneous population with varied treatment response. Here we show unsupervised machine learning identification of three major patient subgroups that account for 92% of the cohort, each with unique whole blood transcriptomic and clinical feature signatures. These subgroups are associated with poor, moderate, and good prognosis. The poor prognosis subgroup is associated with upregulation of the ALAS2 and downregulation of several immunoglobulin genes, while the good prognosis subgroup is defined by upregulation of the bone morphogenetic protein signalling regulator NOG, and the C/C variant of HLA-DPA1/DPB1 (independently associated with survival). These findings independently validated provide evidence for the existence of 3 major subgroups (endophenotypes) within the IPAH classification, could improve risk stratification and provide molecular insights into the pathogenesis of IPAH

    Unsupervised machine learning of high dimensional data for patient stratification

    Get PDF
    The development mechanisms of numerous complex, rare diseases are largely unknown to scientists partly due to their multifaceted heterogeneity. Stratifying patients is becoming a very important objective as we further research that inherent heterogeneity which can be utilised towards personalised medicine. However, considerable difficulties slow down accurate patient stratification mainly represented by outdated clinical criteria, weak associations or simple symptom categories. Fortunately, immense steps have been taken towards multiple omic data generation and utilisation aiming to produce new insights as in exploratory machine learning which showed the potential to identify the source of disease mechanisms from patient subgroups. This work describes the development of a modular clustering toolkit, named Omada, designed to assist researchers in exploring disease heterogeneity without extensive expertise in the machine learning field. Subsequently, it assesses Omada’s capabilities and validity by testing the toolkit on multiple data modalities from pulmonary hypertension (PH) patients. I first demonstrate the toolkit’s ability to create biologically meaningful subgroups based on whole blood RNA-seq data from H/IPAH patients in the manuscript “Biological heterogeneity in idiopathic pulmonary arterial hypertension identified through unsupervised transcriptomic profiling of whole blood”. Our work on the manuscript titled “Diagnostic miRNA signatures for treatable forms of pulmonary hypertension highlight challenges with clinical classification” aimed to apply the same clustering approach on a PH microRNA dataset as a first step in forming microRNA diagnostic signatures by recognising the potential of microRNA expression in identifying diverse disease sub-populations irrespectively of pre-existing PH classes. The toolkit’s effectiveness on metabolite data was also tested. Lastly, a longitudinal clustering approach was explored on activity readouts from wearables on COVID-19 patients as part of our manuscript “Unsupervised machine learning identifies and associates trajectory patterns of COVID-19 symptoms and physical activity measured via a smart watch”. Two clusters of high and low activity trajectories were generated and associated with symptom classes showing a weak but interesting relationship between the two. In summary, this thesis is examining the potential of patient stratification based on several data types from patients that represent a new, unseen picture of disease mechanisms. The tools presented provide important indications of distinct patient groups and could generate the insights needed for further targeted research and clinical associations that can help towards understanding rare, complex diseases

    Tracking Cancer Evolution Reveals Constrained Routes to Metastases: TRACERx Renal.

    Get PDF
    Clear-cell renal cell carcinoma (ccRCC) exhibits a broad range of metastatic phenotypes that have not been systematically studied to date. Here, we analyzed 575 primary and 335 metastatic biopsies across 100 patients with metastatic ccRCC, including two cases sampledat post-mortem. Metastatic competence was afforded by chromosome complexity, and we identify 9p loss as a highly selected event driving metastasis and ccRCC-related mortality (p = 0.0014). Distinct patterns of metastatic dissemination were observed, including rapid progression to multiple tissue sites seeded by primary tumors of monoclonal structure. By contrast, we observed attenuated progression in cases characterized by high primary tumor heterogeneity, with metastatic competence acquired gradually and initial progression to solitary metastasis. Finally, we observed early divergence of primitive ancestral clones and protracted latency of up to two decades as a feature of pancreatic metastases

    Actionable perturbations of damage responses by TCL1/ATM and epigenetic lesions form the basis of T-PLL

    Get PDF
    T-cell prolymphocytic leukemia (T-PLL) is a rare and poor-prognostic mature T-cell malignancy. Here we integrated large-scale profiling data of alterations in gene expression, allelic copy number (CN), and nucleotide sequences in 111 well-characterized patients. Besides prominent signatures of T-cell activation and prevalent clonal variants, we also identify novel hot-spots for CN variability, fusion molecules, alternative transcripts, and progression-associated dynamics. The overall lesional spectrum of T-PLL is mainly annotated to axes of DNA damage responses, T-cell receptor/cytokine signaling, and histone modulation. We formulate a multi-dimensional model of T-PLL pathogenesis centered around a unique combination of TCL1 overexpression with damaging ATM aberrations as initiating core lesions. The effects imposed by TCL1 cooperate with compromised ATM toward a leukemogenic phenotype of impaired DNA damage processing. Dysfunctional ATM appears inefficient in alleviating elevated redox burdens and telomere attrition and in evoking a p53-dependent apoptotic response to genotoxic insults. As non-genotoxic strategies, synergistic combinations of p53 reactivators and deacetylase inhibitors reinstate such cell death execution.Peer reviewe

    Tracking Cancer Evolution Reveals Constrained Routes to Metastases: TRACERx Renal.

    Get PDF
    Clear-cell renal cell carcinoma (ccRCC) exhibits a broad range of metastatic phenotypes that have not been systematically studied to date. Here, we analyzed 575 primary and 335 metastatic biopsies across 100 patients with metastatic ccRCC, including two cases sampledat post-mortem. Metastatic competence was afforded by chromosome complexity, and we identify 9p loss as a highly selected event driving metastasis and ccRCC-related mortality (p = 0.0014). Distinct patterns of metastatic dissemination were observed, including rapid progression to multiple tissue sites seeded by primary tumors of monoclonal structure. By contrast, we observed attenuated progression in cases characterized by high primary tumor heterogeneity, with metastatic competence acquired gradually and initial progression to solitary metastasis. Finally, we observed early divergence of primitive ancestral clones and protracted latency of up to two decades as a feature of pancreatic metastases

    Therapeutic and prognostic strategies in neuroblastoma : exploring nuclear hormone receptors, MYC targets, and DIAPH3

    Get PDF
    Neuroblastoma (NB) is a pediatric cancer derived from the cells of neural crest origin that form the sympathoadrenal system. Typically, the tumor cells migrate along the spinal cord and spread to the chest, neck, and/or abdomen. Different clinical behaviors are observed in this disease: some tumors spontaneously regress without treatment, while others are highly aggressive and resistant to current therapies. Approximately 40% of high-risk NB patients have MYCN amplification while 10% have MYC (i.e. encoding c-MYC) overexpression. These patients have undifferentiated tumors with a poor prognosis. Our group previously found that the expression and activation of nuclear hormone receptors (NHRs) estrogen receptor alpha (ERα) by 17-ÎČ-estradiol (E2), and the glucocorticoid receptor (GR) by dexamethasone (DEX), could trigger differentiation by disrupting the regulation of the miR-17 ~ 92 microRNA cluster by MYCN. In paper I, we sought to investigate whether the simultaneous activation of both ERα and GR has a more beneficial effect compared to the activation of either ERα or GR alone. We examined cell survival, alterations in cell shape as indicated by neurite extension, variations in metabolic pathways, accumulation of lipid droplets, and performed xenograft experiments. Our findings revealed that the simultaneous activation of GR and ERα, compared to their single activation, led to reduced viability and a more robust differentiation. This dual activation also caused changes in glycolysis and oxidative phosphorylation, increased lipid droplet accumulation, and decreased aggressiveness in mouse models. The triple activation with an additional activation of the retinoic acid receptor using all trans-retinoic acid (ATRA), amplified the differentiation phenotype. Bulk-sequencing analysis showed that patients with high levels of NHRs are related to favorable survival and clinical outcome. In summary, our data suggest that combination activation of these NHRs could be a potential differentiation induction treatment. Paper II investigates target genes of c-MYC and MYCN to explore if it is possible to obtain a better prognosis prediction using the expression of this group of genes, instead of the expression of MYC and/or MYCN alone. In addition, we analyzed if there are different prediction power capabilities between c-MYC and MYCN target genes, and their different role during sympathoadrenal development. We screened lists of target genes by using comprehensive approaches, including differential expression analysis between clinical risk groups, INSS stages, MYCN amplification status, progression status; Univariate Cox regression analysis to select the target genes related to prognosis prediction power, and protein interaction network analysis to select genes that share a meaningful biology function. Following the training and validation of (LASSO) regression prediction models in three different patient cohorts (SEQC, Kocak, and Versteeg), we found that a risk score computed on c-MYC/MYCN target genes with prognostic value, could effectively classify patients in groups with different survival probabilities. The high-risk group of patients exhibited unfavorable clinical outcomes and low survival rates. Further, single cell RNA sequencing analysis revealed that c-MYC and MYCN targets have different expression patterns during sympathoadrenal development. Notably, genes linked to adverse outcomes were predominantly expressed in sympathoblasts in comparison to chromaffin cells. In summary, our research provides new insights into the importance of c-MYC/MYCN target genes during sympathoadrenal development and their value in predicting patient outcome. In paper III we studied the function of one member of the formin protein family involved in cytoskeleton modulation: Diaphanous Related Formin 3 (DIAPH3). We found that high DIAPH3 expression in NB tumors are associated with MYCN amplification, higher stage, risk, progression and negative clinical outcome. Elevated DIAPH3 expression was also found in specific cells during mouse sympathoadrenal development and in progenitor cells of the post- natal human adrenal gland. Furthermore, the knockdown of DIAPH3 resulted in a slight decrease in cell growth and cell cycle arrest. Our study suggests that DIAPH3 could be a promising target for new therapeutic strategies
    corecore