1,347 research outputs found

    PyGNA:A unified framework for geneset network analysis

    Get PDF
    Input data and results for the manuscript "PyGNA: a unified framework for geneset network analysis". Manifest files describe the content of each file.This work has been supported by the Wellcome Trust Seed Award in Science (207769/A/17/Z) to G.S

    Machine learning and large scale cancer omic data: decoding the biological mechanisms underpinning cancer

    Get PDF
    Many of the mechanisms underpinning cancer risk and tumorigenesis are still not fully understood. However, the next-generation sequencing revolution and the rapid advances in big data analytics allow us to study cells and complex phenotypes at unprecedented depth and breadth. While experimental and clinical data are still fundamental to validate findings and confirm hypotheses, computational biology is key for the analysis of system- and population-level data for detection of hidden patterns and the generation of testable hypotheses. In this work, I tackle two main questions regarding cancer risk and tumorigenesis that require novel computational methods for the analysis of system-level omic data. First, I focused on how frequent, low-penetrance inherited variants modulate cancer risk in the broader population. Genome-Wide Association Studies (GWAS) have shown that Single Nucleotide Polymorphisms (SNP) contribute to cancer risk with multiple subtle effects, but they are still failing to give further insight into their synergistic effects. I developed a novel hierarchical Bayesian regression model, BAGHERA, to estimate heritability at the gene-level from GWAS summary statistics. I then used BAGHERA to analyse data from 38 malignancies in the UK Biobank. I showed that genes with high heritable risk are involved in key processes associated with cancer and are often localised in genes that are somatically mutated drivers. Heritability, like many other omics analysis methods, study the effects of DNA variants on single genes in isolation. However, we know that most biological processes require the interplay of multiple genes and we often lack a broad perspective on them. For the second part of this thesis, I then worked on the integration of Protein-Protein Interaction (PPI) graphs and omics data, which bridges this gap and recapitulates these interactions at a system level. First, I developed a modular and scalable Python package, PyGNA, that enables robust statistical testing of genesets' topological properties. PyGNA complements the literature with a tool that can be routinely introduced in bioinformatics automated pipelines. With PyGNA I processed multiple genesets obtained from genomics and transcriptomics data. However, topological properties alone have proven to be insufficient to fully characterise complex phenotypes. Therefore, I focused on a model that allows to combine topological and functional data to detect multiple communities associated with a phenotype. Detecting cancer-specific submodules is still an open problem, but it has the potential to elucidate mechanisms detectable only by integrating multi-omics data. Building on the recent advances in Graph Neural Networks (GNN), I present a supervised geometric deep learning model that combines GNNs and Stochastic Block Models (SBM). The model is able to learn multiple graph-aware representations, as multiple joint SBMs, of the attributed network, accounting for nodes participating in multiple processes. The simultaneous estimation of structure and function provides an interpretable picture of how genes interact in specific conditions and it allows to detect novel putative pathways associated with cancer

    Dissecting the heritable risk of breast cancer:From statistical methods to susceptibility genes

    Get PDF
    Decades of research have shown that rare highly penetrant mutations can promote tumorigenesis, but it is still unclear whether variants observed at high-frequency in the broader population could modulate the risk of developing cancer. Genome-wide Association Studies (GWAS) have generated a wealth of data linking single nucleotide polymorphisms (SNPs) to increased cancer risk, but the effect of these mutations are usually subtle, leaving most of cancer heritability unexplained. Understanding the role of high-frequency mutations in cancer can provide new intervention points for early diagnostics, patient stratification and treatment in malignancies with high prevalence, such as breast cancer. Here we review state-of-the-art methods to study cancer heritability using GWAS data and provide an updated map of breast cancer susceptibility loci at the SNP and gene level

    Context-dependent neocentromere activity in synthetic yeast chromosome VIII

    Get PDF
    Pioneering advances in genome engineering, and specifically in genome writing, have revolutionized the field of synthetic biology, propelling us toward the creation of synthetic genomes. The Sc2.0 project aims to build the first fully synthetic eukaryotic organism by assembling the genome of Saccharomyces cerevisiae. With the completion of synthetic chromosome VIII (synVIII) described here, this goal is within reach. In addition to writing the yeast genome, we sought to manipulate an essential functional element: the point centromere. By relocating the native centromere sequence to various positions along chromosome VIII, we discovered that the minimal 118-bp CEN8 sequence is insufficient for conferring chromosomal stability at ectopic locations. Expanding the transplanted sequence to include a small segment (~500 bp) of the CDEIII-proximal pericentromere improved chromosome stability, demonstrating that minimal centromeres display context-dependent functionality </p

    Consequences of a telomerase-related fitness defect and chromosome substitution technology in yeast synIX strains

    Get PDF
    We describe the complete synthesis, assembly, debugging, and characterization of a synthetic 404,963 bp chromosome, synIX (synthetic chromosome IX). Combined chromosome construction methods were used to synthesize and integrate its left arm (synIXL) into a strain containing previously described synIXR. We identified and resolved a bug affecting expression of EST3, a crucial gene for telomerase function, producing a synIX strain with near wild-type fitness. To facilitate future synthetic chromosome consolidation and increase flexibility of chromosome transfer between distinct strains, we combined chromoduction, a method to transfer a whole chromosome between two strains, with conditional centromere destabilization to substitute a chromosome of interest for its native counterpart. Both steps of this chromosome substitution method were efficient. We observed that wild-type II tended to co-transfer with synIX and was co-destabilized with wild-type IX, suggesting a potential gene dosage compensation relationship between these chromosomes. </p

    Debugging and consolidating multiple synthetic chromosomes reveals combinatorial genetic interactions

    Get PDF
    The Sc2.0 project is building a eukaryotic synthetic genome from scratch. A major milestone has been achieved with all individual Sc2.0 chromosomes assembled. Here, we describe the consolidation of multiple synthetic chromosomes using advanced endoreduplication intercrossing with tRNA expression cassettes to generate a strain with 6.5 synthetic chromosomes. The 3D chromosome organization and transcript isoform profiles were evaluated using Hi-C and long-read direct RNA sequencing. We developed CRISPR Directed Biallelic URA3-assisted Genome Scan, or ‘‘CRISPR D-BUGS,’’ to map phenotypic variants caused by specific designer modifications, known as ‘‘bugs.’’ We first fine-mapped a bug in synthetic chromosome II (synII) and then discovered a combinatorial interaction associated with synIII and synX, revealing an unexpected genetic interaction that links transcriptional regulation, inositol metabolism, and tRNASer CGA abundance. Finally, to expedite consolidation, we employed chromosome substitution to incorporate the largest chromosome (synIV), thereby consolidating &gt;50% of the Sc2.0 genome in one strain </p

    Manipulating the 3D organization of the largest synthetic yeast chromosome

    Get PDF
    Whether synthetic genomes can power life has attracted broad interest in the synthetic biology field. Here, we report de novo synthesis of the largest eukaryotic chromosome thus far, synIV, a 1,454,621-bp yeast chromosome resulting from extensive genome streamlining and modification. We developed megachunk assembly combined with a hierarchical integration strategy, which significantly increased the accuracy and flexibility of synthetic chromosome construction. Besides the drastic sequence changes, we further manipulated the 3D structure of synIV to explore spatial gene regulation. Surprisingly, we found few gene expression changes, suggesting that positioning inside the yeast nucleoplasm plays a minor role in gene regulation. Lastly, we tethered synIV to the inner nuclear membrane via its hundreds of loxPsym sites and observed transcriptional repression of the entire chromosome, demonstrating chromosome-wide transcription manipulation without changing the DNA sequences. Our manipulation of the spatial structure of synIV sheds light on higher-order architectural design of the synthetic genomes. </p

    The Network Zoo: a multilingual package for the inference and analysis of gene regulatory networks

    Get PDF
    Inference and analysis of gene regulatory networks (GRNs) require software that integrates multi-omic data from various sources. The Network Zoo (netZoo; netzoo.github.io) is a collection of open-source methods to infer GRNs, conduct differential network analyses, estimate community structure, and explore the transitions between biological states. The netZoo builds on our ongoing development of network methods, harmonizing the implementations in various computing languages and between methods to allow better integration of these tools into analytical pipelines. We demonstrate the utility using multi-omic data from the Cancer Cell Line Encyclopedia. We will continue to expand the netZoo to incorporate additional methods
    • …
    corecore