14 research outputs found
Methods for Epigenetic Analyses from Long-Read Sequencing Data
Epigenetics, particularly the study of DNA methylation, is a cornerstone field for our understanding of human development and disease.
DNA methylation has been included in the "hallmarks of cancer" due to its important function as a biomarker and its contribution to carcinogenesis and cancer cell plasticity.
Long-read sequencing technologies, such as the Oxford Nanopore Technologies platform, have evolved the study of structural variations, while at the same time allowing direct measurement of DNA methylation on the same reads.
With this, new avenues of analysis have opened up, such as long-range allele-specific methylation analysis, methylation analysis on structural variations, or relating nearby epigenetic modalities on the same read to another.
Basecalling and methylation calling of Nanopore reads is a computationally expensive task which requires complex machine learning architectures.
Read-level methylation calls require different approaches to data management and analysis than ones developed for methylation frequencies measured from short-read technologies or array data.
The 2-dimensional nature of read and genome associated DNA methylation calls, including methylation caller uncertainties, are much more storage costly than 1-dimensional methylation frequencies.
Methods for storage, retrieval, and analysis of such data therefore require careful consideration.
Downstream analysis tasks, such as methylation segmentation or differential methylation calling, have the potential of benefiting from read information and allow uncertainty propagation.
These avenues had not been considered in existing tools.
In my work, I explored the potential of long-read DNA methylation analysis and tackled some of the challenges of data management and downstream analysis using state of the art software architecture and machine learning methods.
I defined a storage standard for reference anchored and read assigned DNA methylation calls, including methylation calling uncertainties and read annotations such as haplotype or sample information.
This storage container is defined as a schema for the hierarchical data format version 5, includes an index for rapid access to genomic coordinates, and is optimized for parallel computing with even load balancing.
It further includes a python API for creation, modification, and data access, including convenience functions for the extraction of important quality statistics via a command line interface.
Furthermore, I developed software solutions for the segmentation and differential methylation testing of DNA methylation calls from Nanopore sequencing.
This implementation takes advantage of the performance benefits provided by my high performance storage container.
It includes a Bayesian methylome segmentation algorithm which allows for the consensus instance segmentation of multiple sample and/or haplotype assigned DNA methylation profiles, while considering methylation calling uncertainties.
Based on this segmentation, the software can then perform differential methylation testing and provides a large number of options for statistical testing and multiple testing correction.
I benchmarked all tools on both simulated and publicly available real data, and show the performance benefits compared to previously existing and concurrently developed solutions.
Next, I applied the methods to a cancer study on a chromothriptic cancer sample from a patient with Sonic Hedgehog Medulloblastoma.
I here report regulatory genomic regions differentially methylated before and after treatment, allele-specific methylation in the tumor, as well as methylation on chromothriptic structures.
Finally, I developed specialized methylation callers for the combined DNA methylation profiling of CpG, GpC, and context-free adenine methylation.
These callers can be used to measure chromatin accessibility in a NOMe-seq like setup, showing the potential of long-read sequencing for the profiling of transcription factor co-binding.
In conclusion, this thesis presents and subsequently benchmarks new algorithmic and infrastructural solutions for the analysis of DNA methylation data from long-read sequencing
An ex vivo system to study cellular dynamics underlying mouse peri-implantation development
マウスの着床期の胚発生を三次元で再現することに成功. 京都大学プレスリリース. 2022-02-09.Upon implantation, mammalian embryos undergo major morphogenesis and key developmental processes such as body axis specification and gastrulation. However, limited accessibility obscures the study of these crucial processes. Here, we develop an ex vivo Matrigel-collagen-based culture to recapitulate mouse development from E4.5 to E6.0. Our system not only recapitulates embryonic growth, axis initiation, and overall 3D architecture in 49% of the cases, but its compatibility with light-sheet microscopy also enables the study of cellular dynamics through automatic cell segmentation. We find that, upon implantation, release of the increasing tension in the polar trophectoderm is necessary for its constriction and invagination. The resulting extra-embryonic ectoderm plays a key role in growth, morphogenesis, and patterning of the neighboring epiblast, which subsequently gives rise to all embryonic tissues. This 3D ex vivo system thus offers unprecedented access to peri-implantation development for in toto monitoring, measurement, and spatiotemporally controlled perturbation, revealing a mechano-chemical interplay between extra-embryonic and embryonic tissues
Screenshot of the front end of the database consisting of a convenient table of clickable queries and examples of the corresponding input and output parameters.
<p>Screenshot of the front end of the database consisting of a convenient table of clickable queries and examples of the corresponding input and output parameters.</p
Characterization of the immunophenotypes and antigenomes of colorectal cancers reveals distinct tumor escape mechanisms and novel targets for immunotherapy
International audienceBackground: While large-scale cancer genomic projects are comprehensively characterizing the mutational spectrum of various cancers, so far little attention has been devoted to either define the antigenicity of these mutations or to characterize the immune responses they elicit. Here we present a strategy to characterize the immunophenotypes and the antigen-ome of human colorectal cancer. Results: We apply our strategy to a large colorectal cancer cohort (n = 598) and show that subpopulations of tumor-infiltrating lymphocytes are associated with distinct molecular phenotypes. The characterization of the antigenome shows that a large number of cancer-germline antigens are expressed in all patients. In contrast, neo-antigens are rarely shared between patients, indicating that cancer vaccination requires individualized strategy. Analysis of the genetic basis of the tumors reveals distinct tumor escape mechanisms for the patient subgroups. Hypermutated tumors are depleted of immunosuppressive cells and show upregulation of immunoinhibitory molecules. Non-hypermutated tumors are enriched with immunosuppressive cells, and the expression of immunoinhibitors and MHC molecules is downregulated. Reconstruction of the interaction network of tumor-infiltrating lymphocytes and immunomodulatory molecules followed by a validation with 11 independent cohorts (n = 1,945) identifies BCMA as a novel druggable target. Finally, linear regression modeling identifies major determinants of tumor immunogenicity, which include well-characterized modulators as well as a novel candidate, CCR8, which is then tested in an orthologous immunodeficient mouse model. Conclusions: The immunophenotypes of the tumors and the cancer antigenome remain widely unexplored, and our findings represent a step toward the development of personalized cancer immunotherapies
Detailed results of SIMPLEX evaluation.
<p>Listed are key figures (in avg.) for SE and PE samples.</p
Mandatory pipeline parameters.
<p>Listed are all parameters that need to be specified when starting the pipeline.</p><p>If PE data is given, the file names need to end with _R1 or _R2.</p
Coordinates of clusters in Figures 2A and S1A, B.
<p>Rows and columns were divided so as to result in a single cluster family <i>per</i> area, when possible.</p
GO categories <i>versus</i> ordered hierarchical clusters CIM.
<p>(<b>A</b>) Compact version. The full version is available as <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0030317#pone.0030317.s001" target="_blank">Figures S1A</a>, B. Only categories with FDR<0.10 for at least one cut are represented. The coordinates of the clusters (<i>e.g.</i>, R1, C1) are shown in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0030317#pone-0030317-t001" target="_blank">Table 1</a>. The HTGM FDR for the GO categories for the 20-, 40-, 80-, and 160-cuts are given in green, blue, pink, and red, respectively. A bright shade corresponds to high correlation (i.e. a low FDR), and a darker shade corresponds to an FDR close to the threshold of 0.10. The cluster numbers for the 160-cuts are shown at the right of each encircled grouping. (B) Blowup of the cluster 52 family grouping derived from <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0030317#pone-0030317-g002" target="_blank">Figure 2A</a>.</p
Description of output files.
<p>Listed are key intermediate and final results that are created by the pipeline.</p