1,441 research outputs found
Identifying micro-inversions using high-throughput sequencing reads
Background: The identification of inversions of DNA segments shorter than read length (e.g., 100 bp), defined as micro-inversions (MIs), remains challenging for next-generation sequencing reads. It is acknowledged that MIs are important genomic variation and may play roles in causing genetic disease. However, current alignment methods are generally insensitive to detect MIs. Here we develop a novel tool, MID (Micro-Inversion Detector), to identify MIs in human genomes using next-generation sequencing reads. Results: The algorithm of MID is designed based on a dynamic programming path-finding approach. What makes MID different from other variant detection tools is that MID can handle small MIs and multiple breakpoints within an unmapped read. Moreover, MID improves reliability in low coverage data by integrating multiple samples. Our evaluation demonstrated that MID outperforms Gustaf, which can currently detect inversions from 30 bp to 500 bp. Conclusions: To our knowledge, MID is the first method that can efficiently and reliably identify MIs from unmapped short next-generation sequencing reads. MID is reliable on low coverage data, which is suitable for large-scale projects such as the 1000 Genomes Project (1KGP). MID identified previously unknown MIs from the 1KGP that overlap with genes and regulatory elements in the human genome. We also identified MIs in cancer cell lines from Cancer Cell Line Encyclopedia (CCLE). Therefore our tool is expected to be useful to improve the study of MIs as a type of genetic variant in the human genome. The source code can be downloaded from: http://cqb.pku.edu.cn/ZhuLab/MID.NCI NIH HHS [CA182360, R33 CA182360]; NHGRI NIH HHS [HG007352, R01 HG007352]SCI(E)PubMedARTICLESuppl 141
Detection of Genomic Inversion from Single End Read
Structural Variations (SVs) are genomic rearrangements that include both copy-number variants,such as insertion,deletions, duplications and balanced variants like inversion and translocations. These SVs are getting more attentions for research and investigation because of their role on human phenotype, genetic diseases and genomic rearrangements. Evolution of Next-generation Sequencing has provided golden opportunities to investigate these variants and make their wider and clear spectrum in human genome. This investigation includes identification of type of SVs and their breakpoints at base pair level. For their effective identification and breakpoint resolution, many techniques are devised mainly based on paired end read. With relatively low cost and high efficiency different platforms including ION TORRENT, Illumina can generate high throughput Single End reads. In this thesis we provide a novel approach based on Single End reads to detect genomic inversions in human genome. We also compare our approach with existing methods based on paired end reads and show that our approach is competitive in terms of sensitivity and precision at relatively low coverage for detection of breakpoints of genomic inversion
Genomic approaches to understanding population divergence and speciation in birds
Β© 2016 American Ornithologists\u27 Union. The widespread application of high-throughput sequencing in studying evolutionary processes and patterns of diversification has led to many important discoveries. However, the barriers to utilizing these technologies and interpreting the resulting data can be daunting for first-time users. We provide an overview and a brief primer of relevant methods (e.g., whole-genome sequencing, reduced-representation sequencing, sequence-capture methods, and RNA sequencing), as well as important steps in the analysis pipelines (e.g., loci clustering, variant calling, whole-genome and transcriptome assembly). We also review a number of applications in which researchers have used these technologies to address questions related to avian systems. We highlight how genomic tools are advancing research by discussing their contributions to 3 important facets of avian evolutionary history. We focus on (1) general inferences about biogeography and biogeographic history, (2) patterns of gene flow and isolation upon secondary contact and hybridization, and (3) quantifying levels of genomic divergence between closely related taxa. We find that in many cases, high-throughput sequencing data confirms previous work from traditional molecular markers, although there are examples in which genome-wide genetic markers provide a different biological interpretation. We also discuss how these new data allow researchers to address entirely novel questions, and conclude by outlining a number of intellectual and methodological challenges as the genomics era moves forward
Characterising chromosome rearrangements: recent technical advances in molecular cytogenetics
Genomic rearrangements can result in losses, amplifications, translocations and inversions of DNA fragments thereby modifying genome architecture, and potentially having clinical consequences. Many genomic disorders caused by structural variation have initially been uncovered by early cytogenetic methods. The last decade has seen significant progression in molecular cytogenetic techniques, allowing rapid and precise detection of structural rearrangements on a whole-genome scale. The high resolution attainable with these recently developed techniques has also uncovered the role of structural variants in normal genetic variation alongside single-nucleotide polymorphisms (SNPs). We describe how array-based comparative genomic hybridisation, SNP arrays, array painting and next-generation sequencing analytical methods (read depth, read pair and split read) allow the extensive characterisation of chromosome rearrangements in human genomes
μ μ 체 λ° μ μ¬μ²΄ λΆμμ νμ©ν νμμ (MTX) λ΄μ± HT-29 μΈν¬μ£Όμ tandem DHFR μ μ μ μ¦ν νΉμ± λ° κΈ°μ μ°κ΅¬
νμλ
Όλ¬Έ (λ°μ¬)-- μμΈλνκ΅ λνμ : μκ³Όλν μκ³Όνκ³Ό, 2019. 2. κΉμ’
μΌ.The massively parallel sequencing technology known as next-generation sequencing (NGS) has been currently developed and evolved for cancer genome research to obtain the molecular microscope findings and treatment of disease. The time and cost for NGS analysis have been greatly reduced, so the mechanisms from the basic mechanism of human evolution to the complicated mechanism underlying how genetic changes have driven the resistance of cancer cells under anti-cancer drugs have been comprehensively investigated through advancements in NGS technologies. Therefore, the combination of these NGS technologies has contributed to cancer research such as diagnosis, management, and treatment by identifying and elucidating the molecular tumor profiling and it would play an important role in the future of cancer treatment and of personalized medicine in cancer research.
DHFR gene amplification is present in methotrexate (MTX) resistant colon cancer cells and in acute lymphoblastic leukemia. The region of chromosome 5q14 contains many genes as well as DHFR gene, and little is known about DHFR gene amplification at this position since quantifying amplification size and recognizing the involved repetitive rearrangements in gene amplification position require extra time and efforts with limited technologies and bioinformatics. Also, there is no clear way to assemble the complete structure of the amplified region with short read (read length repeat length), which provide exceptionally long read lengths, have the potential to overcome these limitations and allow for complete assembly of the region.
Here I have proposed an integrative framework to quantify the amplified region and detect structural variations, which are large, complex DNA segments involving repeats by using a combination of technologies, including single molecule real-times sequencing, next generation optical mapping, and high throughput chromosome conformation capture (Hi-C).
The amplification units of 11 genes from DHFR gene to ATP6AP1L gene position on chromosome 5 (~2.2Mbp) and tandem gene amplification about twentyfold longer amplified region than control have been identified by several NGS technologies such as optical mapping and single molecule real-times sequencing, and its abnormally increased expression and complicated splicing patterns were characterized by RNA sequencing data. The novel inversion (chr5:80,618,750-80,631,409) at the DHFR gene of amplified region was detected which might stimulate chromosomal breakage for gene amplification
Using Hi-C technology, the high adjusted interaction frequencies which indicated the inter-chromosomal contact and significant adjusted p-value were detected on the amplified unit and unsuspected position on 5q in MTX resistant HT-29 sample compared to control. It might explain that chromosomal structure from the start position of the amplified unit (80.6Mb - 82.8Mb) to end of 5q (109Mb-138Mb) could have the complex network of spatial contacts to harbor the gene amplification. Also, the increased relative copy number, the several newly identified topologically associating domains (TADs), and extrachromosomal double minutes (DMs) on this amplified region, which were not detected by other technologies, were identified and described for finding the association with the gene amplification mechanism.
Interestingly, the novel frameshift insertions in most of MSH and MLH genes were identified, which could cause the dysregulation of mismatch repair pathway under MTX condition and play an important role on the rapid progression of gene amplification as well as being resistant to MTX. Considering the several characteristics of variable size of tandem gene amplification patterns with homogeneously staining chromosome regions (HSRs), extrachromosomal DM suggested that the gene amplification might be produced from the Breakage-fusion-bridge (BFB) cycles.
Overall, the characterized tandem gene amplified unit, more complicated interaction on intra-chromosome 5, inversion of the amplification unit as well as the mutations in MSH and MLH genes can be the critical factor for identifying the mechanism of genomic rearrangements, and these findings may give new insight into the mechanism underlying the amplification process and evolution of resistance to drugs. Therefore, the comprehensive approach of combined advanced technologies is a powerful tool for interpretation of cancer genomes, and this will provide the depth of insight to identify the most important therapeutic mechanism and new targets of the anti-cancer drug.μ°¨μΈλ μνμ± (next generation sequencingNGS)μΌλ‘ μλ €μ§ λλ λ³λ ¬ μνμ± κΈ°μ μ μ μ μ 체 λ΄μ μ§λ³μ λΆμ νλ―Έκ²½ μμ€μ μλ‘μ΄ λ°κ²¬ λ° μΉλ£λ²μ μ»κΈ° μν΄ κ°λ°λκ³ λ°μ ν΄ μλ€. νμ¬ μ°¨μΈλ μνμ± λΆμμ μν μκ°κ³Ό λΉμ©μ΄ ν¬κ² μ€μ΄λ€μμΌλ©°, μΈκ° μ§νμ κΈ°λ³Έ λ©μ»€λμ¦μμ νμμ λ΄μ±μ 보μ΄λ μ μΈν¬μ μ μ μ λ³νμ κ΄λ ¨λ 볡μ‘ν λ©μ»€λμ¦μ μ΄λ₯΄κΈ°κΉμ§ μ°¨μΈλ μνμ± λΆμμ λ°μ μ ν΅νμ¬ μ’
ν©μ μΌλ‘ λΆμλμ΄μλ€. λ°λΌμ μ΄λ¬ν μ°¨μΈλ μνμ± λΆμ κΈ°μ λ€μ μ‘°ν©μ λΆμ μμ€μ μ’
μ νλ‘νμΌμ κ·λͺ
νκ³ λ°νμ€μΌλ‘μ¨ μ§λ¨, κ΄λ¦¬ λ° μΉλ£λ₯Ό μν μ μ°κ΅¬μ κΈ°μ¬νμΌλ©°, μ μΉλ£ λ° μ μ°κ΅¬μμμ λ§μΆ€ μνμ λ―Έλμ μ€μν μν μ ν κ²μ΄λ€.
DHFR μ μ μ μ¦ν νμμ νμμ 맀ν νΈλ μΈμ΄νΈ(methotrexateMTX)μ λ΄μ±μ 보μ΄λ κ²°μ₯μ μΈν¬μ μ‘΄μ¬νλ©° λν κΈμ± λ¦Όν κ΅¬μ± λ°±νλ³μ μ‘΄μ¬νλ€. 5q14 μΌμ체μ μμμ λ§μ μ μ μλ₯Ό ν¬ν¨νκ³ μμΌλ©° λμ₯ μ μΈν¬κ° 맀ν νΈλ μΈμ΄νΈ μνμμ μ νμ λ³΄μΌ λ μ μ μ μ¦ν νμμ κ·Όμμ΄ λλ κ²μΌλ‘ μλ €μ Έ μμΌλ, μ€μ μ μ 체μ λ³νμ λν΄μλ κ±°μ μλ €μ Έ μμ§ μμλ€. μ΄μ μλ 짧μ μΌκΈ° μμ΄ λΆμ κΈ°μ μ μ¬μ©ν΄μ λΆμνμμ§λ§, μ 곡λ 짧μ μμ΄μ λ°λ³΅μμ΄ μμ (repetitive region)μ λΆμ ν μ μκ³ μ ν© μμ΄ (junction reads)λ₯Ό μλ³ ν μ μκΈ° λλ¬Έμ μ¦ν λ μμμ μ 체 ꡬ쑰λ₯Ό μ‘°ν© (assemble) ν λͺ
νν λ°©λ²μ΄ μμλ€.
μμΈμ μΌλ‘ κΈ΄ μμ΄μ μ 곡νλ λ¨μΌ λΆμ μ€μκ° (PacBio SMRT) μνμ±μ μ΄λ¬ν νκ³λ₯Ό 극볡νκ³ λ°λ³΅ μμμ μ μ 체 μμ΄μ μλ²½ν 쑰립 (assembly) μ κ°λ₯νκ² νλ€. λ³Έ μ°κ΅¬μμλ λ¨μΌ λΆμ μ€μκ° μνμ±, μ°¨μΈλ μ νν¨μ κ΄ν μ§λ (next generation optical mapping) λ° DNAμ 3μ°¨μ(3D) ꡬμ±μ μΈ‘μ νλ λΆμλ² (high throughput chromosome conformation captureHi-C )κ³Ό κ°μ μλ‘μ΄ μ μ μ λΆμ κΈ°μ μ μ¬μ©νμ¬ λ©ν νΈλ μΈμ΄νΈμ λ΄μ±μ 보μ΄λ κ²°μ₯μ μΈν¬μ£Ό(HT-29)λ΄μ μ μ 체 볡μ κ³Όμ μ νμ
νμκ³ , ν¬κ³ 볡μ‘ν DNA λ¨νΈμ κ°λ λ°λ³΅ μμ΄μ ꡬ쑰μ λ³μ΄(structural variations)λ₯Ό κ²μΆνλ ν΅ν©μ μΈ νλ μμν¬λ₯Ό μ μνμλ€.
λ¨μΌ λΆμ μ€μκ° μνμ±κ³Ό κ΄ν μ§λλ₯Ό νμ©νμ¬, μ μ 체 λ°λ³΅μμ΄μ μλ²½νκ² μ‘°λ¦½νκ³ μ νμκ³ , 5λ² μΌμ체μ DHFR μ μ μμμ ATP6AP1L μ μ μκΉμ§ 2.2Mbpμ μ΄λ₯΄λ 11 κ°μ μ μ μκ° λ³΅μ λ¨μμ΄μ κ·Έ μ μ μλ€μ΄ κ·Έ μΌλ ¬ μμλλ‘ λμ‘°κ΅°μ λΉν΄ 20λ°° μ λ κΈΈκ² λ³΅μ λ¨μ νμΈνμλ€. λν, μ μ μ λ°νλ λ° RNA μ μ μ μ ν© ν¨ν΄(splicing pattern)μ λμ‘°κ΅°κ³Ό λΉκ΅ λΆμν κ²°κ³Ό, μ μ 체 볡μ λ¨μμμ μκ²λ 5λ°°μμ ν¬κ²λ 122λ°°κΉμ§ λΉμ μμ μΈ μ μ μ λ°νλμ΄ μΈ‘μ λμμΌλ©°, 볡μ‘ν RNAμ ν© ν¨ν΄μ΄ λλ°λλ κ²μ νμΈνμλ€.
λν, μΌμ체 ꡬ쑰λ₯Ό νμ
νλ DNAμ 3μ°¨μ(3D) ꡬμ±μ μΈ‘μ ν λΆμ κ²°κ³Όλ₯Ό ν λλ‘, μΌμ체 λ΄μ μ μ μκ° μΌλ§λ§νΌ μνΈ μμ©μ νλκ° νμΈνμμ λ, λμ‘°κ΅°μ λΉνμ¬ λͺλͺμ μμ νμ μ°κ΄ λλ©μΈ (topologically associating domainsTADs)μ΄ λ§€ν νΈλ μΈμ΄νΈμ λ΄μ±μ μ§μ κ²°μ₯μ μΈν¬μ£Ό(HT-29)μ μ μ μκ° μ¦νλ μμμ μ€μ λ° μ’
λ¨μ μμ μλ‘κ² λ°κ²¬λμμΌλ©°, μ΄ λΆλΆμμλ μ‘°μ λ μνΈ μμ© μ λ κ°μ΄ λκ³ , κ·Έ κ°μ΄ ν΅κ³νμ μΌλ‘ μ μν¨(p<0.05)μ νμΈνμλ€. λλΆμ΄, λ°κ²¬νκΈ° νλ μ΄μ€κ·Ήλ―ΈμΌμ체(double minute)κ° λ°κ²¬λμλ€.
ν₯λ―Έλ‘κ²λ, MSHμ MLH μ μ μμ νμ΄λ μ½μ
λμ°λ³μ΄ (frameshift insertion)κ° λ§€ν νΈλ μΈμ΄νΈ (methotrexate) 쑰건 νμμ μΌκΈ° μμ μλͺ» μ§μ§μμ μ볡νλ λΆμκΈ°μ (mismatch repair pathway)μ μ μ μ λΆμμ μ±κ³Ό μ‘°μ μ₯μ λ₯Ό μΌμΌμΌ°μΌλ©°, DHFR μ μ μ μμΉμμ μμλμ΄ μ€λ³΅λ κ²½μ°(inverted duplication)μΌλ‘ μΈν΄ 5λ² μΌμ체 μμ DHFR μ μ μ μμΉμμ μΌμ체 μ λ¨(chromosome breakage)μ΄ λ°μνμκ³ , λ€μν ν¬κΈ°μ μ μ μκ° μ¦νλ κ· μ§μΌμλΆμ(homogeneously staining regionHSR)κ° μ λ¨μ΅ν©κ°κ΅ν(breakage-fusion-bridge cycleBFB cycle)λ‘ μμ°λ¨μ μ μΆν μ μμλ€.
μ’
ν©μ μΌλ‘, λ³Έ μ°κ΅¬λ 5λ² μΌμ체 λ΄μμμ λ³΄λ€ λ³΅μ‘ν μΌμ체 μνΈ μμ© λ° λ³΅μ λ¨μ λ΄μ μμλ μ μ 체 μ¬λ°°μ΄ (genomic rearrangement) μ κΈ°μ μ νμΈνλ μ€μν μμκ° λ μ μμΌλ©°, μ΄λ¬ν λ°κ²¬μ μ μ μ μ¦ν κ³Όμ μ κΈ°μ΄κ° λλ λ©μ»€λμ¦λΏλ§ μλλΌ μμΈν¬μ νμμ λ΄μ± μ리μ λν μλ‘μ΄ ν΅μ°°λ ₯μ μ 곡 ν μ μμ κ²μ΄λΌ νλ¨νμλ€. λ°λΌμ μ°¨μΈλ μΌκΈ° λΆμλ²κ³Ό λ€μν μλ‘μ΄ μ²¨λ¨ κΈ°μ μ κ²°ν©ν λΆμλ²μ μ μ μ 체μ ν΄μμ μν κ°λ ₯ν λꡬμ΄λ©°, μ μΉλ£μ ν΅μ¬μ μΈ μΉλ£ λ©μ»€λμ¦μ νμ
νμ¬ νμμ μ μλ‘μ΄ λͺ©νλ₯Ό μ€μ ν μ μλ€λ μ μμ μ λ°μνμ λ°μ μ ν° μν₯μ λ―ΈμΉ κ²μΌλ‘ κΈ°λνλ€.Abstract i
Contents vi
List of Tables vii
List of Figures ix
List of Abbreviations xiii
Introduction 1
Material and Methods 6
Results 28
Discussion 87
References 96
Abstract in Korean 106Docto
A Review of Copy Number Variants in Inherited Neuropathies
The rapid development in the last 10-15 years of microarray technologies, such as oligonucleotide
array Comparative Genomic Hybridization (CGH) and Single Nucleotide Polymorphisms
(SNP) genotyping array, has improved the identification of fine chromosomal structural variants, ranging
in length from kilobases (kb) to megabases (Mb), as an important cause of genetic differences
among healthy individuals and also as disease-susceptibility and/or disease-causing factors. Structural
genomic variations due to unbalanced chromosomal rearrangements are known as Copy-Number
Variants (CNVs) and these include variably sized deletions, duplications, triplications and translocations.
CNVs can significantly contribute to human diseases and rearrangements in several dosagesensitive
genes have been identified as an important causative mechanism in the molecular aetiology
of Charcot-Marie-Tooth (CMT) disease and of several CMT-related disorders, a group of inherited
neuropathies with a broad range of clinical phenotypes, inheritance patterns and causative genes. Duplications
or deletions of the dosage-sensitive gene PMP22 mapped to chromosome 17p12 represent
the most frequent causes of CMT type 1A and Hereditary Neuropathy with liability to Pressure Palsies
(HNPP), respectively. Additionally, CNVs have been identified in patients with other CMT types
(e.g., CMT1X, CMT1B, CMT4D) and different hereditary poly- (e.g., giant axonal neuropathy) and
focal- (e.g., hereditary neuralgic amyotrophy) neuropathies, supporting the notion of hereditary peripheral
nerve diseases as possible genomic disorders and making crucial the identification of fine
chromosomal rearrangements in the molecular assessment of such patients. Notably, the application of
advanced computational tools in the analysis of Next-Generation Sequencing (NGS) data has emerged
in recent years as a powerful technique for identifying a genome-wide scale complex structural variants
(e.g., as the ones resulted from balanced rearrangements) and also smaller pathogenic (intragenic)
CNVs that often remain beyond the detection limit of most conventional genomic microarray analyses;
in the context of inherited neuropathies where more than 70 disease-causing genes have been
identified to date, NGS and particularly Whole-Genome Sequencing (WGS) hold the potential to reduce
the number of genomic assays required per patient to reach a diagnosis, analyzing with a single
test all the Single Nucleotide Variants (SNVs) and CNVs in the genes possibly implicated in this
heterogeneous group of disorders
- β¦