Search CORE

19 research outputs found

Computational Methods for Structural Variation Analysis in Populations

Author: Kirsche Melanie
Publication venue: 'The Busan Gyeongnam Mathematical Society'
Publication date: 25/07/2022
Field of study

Recent advances in long-read sequencing have given us an unprecedented view of structural variants (SVs). However, much of their role in disease and evolution remains unknown due to a number of technical and biological challenges, including the high error rate of most long-read sequencing data, the additional complexity of aligning around large variants, and biological differences in how the same SV can manifest in different individuals. In this thesis we introduce novel methods for structural variant analysis and demonstrate how they overcome many of these obstacles. First, we apply recent advances in data structures to the substring search problem and show how learned index structures can enable accelerated alignment of genomic reads. Next, we present an optimized SV calling pipeline that integrates improvements to existing software alongside two novel SV-processing methods, Iris and Jasmine, which improve the accuracy of SV breakpoints and sequences in individual samples and compare and integrate SV calls from multiple samples. Finally, we show how the introduction of CHM13, the first gap-free telomere-to-telomere human reference genome, enables for the first time variant calling in over 100 Mbp of newly resolved sequence and mitigates long-standing issues in variant calling that were attributed to gaps, errors, and minor alleles in the prior GRCh38 reference. We demonstrate the broad applicability of our advancements in SV inference by uncovering novel associations with gene expression in 444 human individuals from the 1000 Genomes Project, by detecting SVs in the tomato genome which affect fruit size and yield, and by comparing SVs between tumor and normal cells in organoids derived from the SKBR3 breast cancer cell line

Johns Hopkins University

JScholarship

Jasmine: Population-scale structural variant comparison and analysis

Author: Aganezov Sergey
Kirsche Melanie
Ni Bohan
Prabhu Gautam
Schatz Michael
Sherman Rachel
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 28/05/2021
Field of study

The increasing availability of long-reads is revolutionizing studies of structural variants (SVs). However, because SVs vary across individuals and are discovered through imprecise read technologies and methods, they can be difficult to compare. Addressing this, we present Jasmine (https://github.com/mkirsche/Jasmine ), a fast and accurate method for SV refinement, comparison, and population analysis. Using an SV proximity graph, Jasmine outperforms five widely-used comparison methods, including reducing the rate of Mendelian discordance in trio datasets by more than five-fold, and reveals a set of high confidence de novo SVs confirmed by multiple long-read technologies. We also present a harmonized callset of 205,192 SVs from 31 samples of diverse ancestry sequenced with long reads. We genotype these SVs in 444 short read samples from the 1000 Genomes Project with both DNA and RNA sequencing data and assess their widespread impact on gene expression, including within several medically relevant genes

Cold Spring Harbor Laboratory Institutional Repository

Automated assembly scaffolding using RagTag elevates a new tomato system for high-throughput genome editing

Author: Aganezov Sergey
Alonge Michael
Jenike Katie
Kirsche Melanie
Lebeigle Ludivine
Lippman Zachary B
Ou Shujun
Schatz Michael C
Soyk Sebastian
Wang Xingang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/12/2022
Field of study

Advancing crop genomics requires efficient genetic systems enabled by high-quality personalized genome assemblies. Here, we introduce RagTag, a toolset for automating assembly scaffolding and patching, and we establish chromosome-scale reference genomes for the widely used tomato genotype M82 along with Sweet-100, a new rapid-cycling genotype that we developed to accelerate functional genomics and genome editing in tomato. This work outlines strategies to rapidly expand genetic systems and genomic resources in other plant species

Cold Spring Harbor Laboratory Institutional Repository

Directory of Open Access Journals

Serveur académique lausannois

PubMed Central

Genomic diversity of SARS-CoV-2 during early introduction into the Baltimore-Washington metropolitan area.

Author: Ernlund Amanda
Evans Jared D
Falade-Nwulia Oluwaseun
Fan Yunfan
Gniazdowski Victoria
Hopkins Mark
Howser Craig
Kirsche Melanie
Lessler Justin
Mehoke Thomas
Morris C Paul
Mostafa Heba H
Ramakrishnan Srividya
Ray Stuart C
Sadowski Norah
Sauer Lauren
Schatz Michael C
Schwartz Matthew
Thielen Peter M
Timp Winston
Trovão Nídia S
Wohl Shirlee
Publication venue: 'American Society for Clinical Investigation'
Publication date: 01/01/2021
Field of study

The early COVID-19 pandemic was characterized by rapid global spread. In Maryland and Washington, DC, United States, more than 2500 cases were reported within 3 weeks of the first COVID-19 detection in March 2020. We aimed to use genomic sequencing to understand the initial spread of SARS-CoV-2 - the virus that causes COVID-19 - in the region. We analyzed 620 samples collected from the Johns Hopkins Health System during March 11-31, 2020, comprising 28.6% of the total cases in Maryland and Washington, DC. From these samples, we generated 114 complete viral genomes. Analysis of these genomes alongside a subsampling of over 1000 previously published sequences showed that the diversity in this region rivaled global SARS-CoV-2 genetic diversity at that time and that the sequences belong to all of the major globally circulating lineages, suggesting multiple introductions into the region. We also analyzed these regional SARS-CoV-2 genomes alongside detailed clinical metadata and found that clinically severe cases had viral genomes belonging to all major viral lineages. We conclude that efforts to control local spread of the virus were likely confounded by the number of introductions into the region early in the epidemic and the interconnectedness of the region as a whole

Cold Spring Harbor Laboratory Institutional Repository

Directory of Open Access Journals

Carolina Digital Repository

Multi-tissue integrative analysis of personal epigenomes

Author: Adrian Jessika
Aganezov Sergey
Balderrama-Gutierrez Gabriela
Banskota Samridhi
Bernstein Bradley
Berthel Ana
Borsari Beatrice
Cameron Christopher
Chang Justin
Chee Sora
Chen Zhanlin
Cherry Michael
Chhetri Surya
Choudhary Jyoti
Corona Guillermo
Danyko Cassidy
Davis Carrie
Dobin Alexander
Drenkow Jorg
Epstein Charles
Farid Daniel
Farrell Nina
Gabdank Idan
Galeev Timur
Gao Jiahao
Gaskell Elizabeth
Gerstein Mark
Gillis Jesse
Gingeras Thomas
Gofin Yoel
Gorkin David
Gu Mengting
Guigo Roderic
Gursoy Gamze
Hecht Vivian
Hitz Benjamin
Issner Robbyn
Kirsche Melanie
Kong Xiangmeng
Lam Bonita
Levine Morgan
Li Bian
Li Shantao
Li Tianxiao
Li Xiqi
Lin Khine
Liu Jason
Luo Ruibang
Mackiewicz Mark
Martins Gabriel
Mendenhall Eric
Milosavljevic Aleksandar
Moore Jill
Mortazavi Ali
Mudge Jonathan
Myers Richard
Navarro Fabio
Nelson Nicholas
Noble William
Nusbaum Chad
Popov Ioann
Pratt Henry
Qiu Yunjiang
Ramakrishnan Srividya
Raymond Joe
Ren Bing
Rozowsky Joel
Salichos Leonidas
Scavelli Alexandra
Schatz Michael
Schreiber Jacob
Sedlazeck Fritz
See Lei
Sherman Rachel
Shi Minyi
Shi Xu
Shoresh Noam
Sloan Cricket
Snyder Michael
Strattan Seth
Sun Maxwell
Tan Zhen
Tanaka Forrest
Vlasova Anna
Wang Jun
Weng Zhiping
Werner Jonathan
Williams Brian
Wold Barbara
Wright James
Xiong Kun
Xu Jinrui
Xu Min
Yan Chengfei
Yang Yucheng
Yu Keyang
Yu Lu
Zaleski Christopher
Zhang Jing
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 26/04/2021
Field of study

Evaluating the impact of genetic variants on transcriptional regulation is a central goal in biological science that has been constrained by reliance on a single reference genome. To address this, we constructed phased, diploid genomes for four cadaveric donors (using long-read sequencing) and systematically charted noncoding regulatory elements and transcriptional activity across more than 25 tissues from these donors. Integrative analysis revealed over a million variants with allele-specific activity, coordinated, locus-scale allelic imbalances, and structural variants impacting proximal chromatin structure. We relate the personal genome analysis to the ENCODE encyclopedia, annotating allele- and tissue-specific elements that are strongly enriched for variants impacting expression and disease phenotypes. These experimental and statistical approaches, and the corresponding EN-TEx resource, provide a framework for personalized functional genomics

Cold Spring Harbor Laboratory Institutional Repository

Caltech Authors

Computational Methods for Structural Variation Analysis in Populations

Author: Kirsche Melanie
Publication venue: 'The Busan Gyeongnam Mathematical Society'
Publication date: 25/07/2022
Field of study

Johns Hopkins University

Democratizing long-read genome assembly.

Author: Kirsche Melanie
Schatz Michael C
Publication venue: 'Elsevier BV'
Publication date: 20/10/2021
Field of study

De novo assembled genomes serve as the backbone for modern genomics. In an article in this issue of Cell Systems, Ekim et al. present the mdBG assembler that can assemble genomes 100-fold faster than previous methods, including a human genome in under 10 min, which unlocks pan-genomics for many species

Cold Spring Harbor Laboratory Institutional Repository

Sapling: accelerating suffix array queries with learned data models.

Author: Das Arun
Kirsche Melanie
Schatz Michael C
Publication venue: 'Oxford University Press (OUP)'
Publication date: 05/05/2021
Field of study

MOTIVATION: As genomic data becomes more abundant, efficient algorithms and data structures for sequence alignment become increasingly important. The suffix array is a widely used data structure to accelerate alignment, but the binary search algorithm used to query, it requires widespread memory accesses, causing a large number of cache misses on large datasets. RESULTS: Here, we present Sapling, an algorithm for sequence alignment, which uses a learned data model to augment the suffix array and enable faster queries. We investigate different types of data models, providing an analysis of different neural network models as well as providing an open-source aligner with a compact, practical piecewise linear model. We show that Sapling outperforms both an optimized binary search approach and multiple widely used read aligners on a diverse collection of genomes, including human, bacteria and plants, speeding up the algorithm by more than a factor of two while adding <1% to the suffix array's memory footprint. AVAILABILITY AND IMPLEMENTATION: The source code and tutorial are available open-source at https://github.com/mkirsche/sapling. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online

Crossref

Cold Spring Harbor Laboratory Institutional Repository

Automated assembly scaffolding elevates a new tomato system for high-throughput genome editing

Author: Aganezov Sergey
Alonge Michael
Kirsche Melanie
Lebeigle Ludivine
Lippman Zachary
Schatz Michael
Soyk Sebastian
Wang Xingang
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 19/11/2021
Field of study

Cold Spring Harbor Laboratory Institutional Repository

GIAB Benchmarking of HG002 Assemblies from HPRC Year 1 Bakeoff

Author: Collins Joanna
Ebert Peter
Formenti Giulio
Garg Shilpa
Harvey William
Hastie Alex
Haukness Marina
Kirsche Melanie
Kolmogorov Mikhail
Koren Sergey
Korlach Jonas
Li Daofeng
Lucas Julian
Luo Feng
Marschall Tobias
McDaniel Jennifer
Nie Fan
Olson Nathan D.
Regier Allison
Rhie Arang
Sanders Ashley D.
Schmitt Anthony
Shafin Kishwar
Shumate Alaina
Stober Catherine
Torrance James
Wang Jianxin
Wood Jonathan
Zimin Aleksey V.
Zook Justin M.
Publication venue: Clemson University Libraries
Publication date: 08/06/2022
Field of study

Clemson University: TigerPrints