16,702 research outputs found

    Stretching the Rules: Monocentric Chromosomes with Multiple Centromere Domains

    Get PDF
    The centromere is a functional chromosome domain that is essential for faithful chromosome segregation during cell division and that can be reliably identified by the presence of the centromere-specific histone H3 variant CenH3. In monocentric chromosomes, the centromere is characterized by a single CenH3-containing region within a morphologically distinct primary constriction. This region usually spans up to a few Mbp composed mainly of centromere-specific satellite DNA common to all chromosomes of a given species. In holocentric chromosomes, there is no primary constriction; the centromere is composed of many CenH3 loci distributed along the entire length of a chromosome. Using correlative fluorescence light microscopy and high-resolution electron microscopy, we show that pea (Pisum sativum) chromosomes exhibit remarkably long primary constrictions that contain 3-5 explicit CenH3-containing regions, a novelty in centromere organization. In addition, we estimate that the size of the chromosome segment delimited by two outermost domains varies between 69 Mbp and 107 Mbp, several factors larger than any known centromere length. These domains are almost entirely composed of repetitive DNA sequences belonging to 13 distinct families of satellite DNA and one family of centromeric retrotransposons, all of which are unevenly distributed among pea chromosomes. We present the centromeres of Pisum as novel ``meta-polycentric'' functional domains. Our results demonstrate that the organization and DNA composition of functional centromere domains can be far more complex than previously thought, do not require single repetitive elements, and do not require single centromere domains in order to segregate properly. Based on these findings, we propose Pisum as a useful model for investigation of centromere architecture and the still poorly understood role of repetitive DNA in centromere evolution, determination, and function

    The Transcriptional Landscape of Marek’s Disease Virus in Primary Chicken B Cells Reveals Novel Splice Variants and Genes

    Get PDF
    Marek’s disease virus (MDV) is an oncogenic alphaherpesvirus that infects chickens and poses a serious threat to poultry health. In infected animals, MDV efficiently replicates in B cells in various lymphoid organs. Despite many years of research, the viral transcriptome in primary target cells of MDV remained unknown. In this study, we uncovered the transcriptional landscape of the very virulent RB1B strain and the attenuated CVI988/Rispens vaccine strain in primary chicken B cells using high-throughput RNA-sequencing. Our data confirmed the expression of known genes, but also identified a novel spliced MDV gene in the unique short region of the genome. Furthermore, de novo transcriptome assembly revealed extensive splicing of viral genes resulting in coding and non-coding RNA transcripts. A novel splicing isoform of MDV UL15 could also be confirmed by mass spectrometry and RT-PCR. In addition, we could demonstrate that the associated transcriptional motifs are highly conserved and closely resembled those of the host transcriptional machinery. Taken together, our data allow a comprehensive re-annotation of the MDV genome with novel genes and splice variants that could be targeted in further research on MDV replication and tumorigenesis

    Adaptive genomic structural variation in the grape powdery mildew pathogen, Erysiphe necator.

    Get PDF
    BackgroundPowdery mildew, caused by the obligate biotrophic fungus Erysiphe necator, is an economically important disease of grapevines worldwide. Large quantities of fungicides are used for its control, accelerating the incidence of fungicide-resistance. Copy number variations (CNVs) are unbalanced changes in the structure of the genome that have been associated with complex traits. In addition to providing the first description of the large and highly repetitive genome of E. necator, this study describes the impact of genomic structural variation on fungicide resistance in Erysiphe necator.ResultsA shotgun approach was applied to sequence and assemble the genome of five E. necator isolates, and RNA-seq and comparative genomics were used to predict and annotate protein-coding genes. Our results show that the E. necator genome is exceptionally large and repetitive and suggest that transposable elements are responsible for genome expansion. Frequent structural variations were found between isolates and included copy number variation in EnCYP51, the target of the commonly used sterol demethylase inhibitor (DMI) fungicides. A panel of 89 additional E. necator isolates collected from diverse vineyard sites was screened for copy number variation in the EnCYP51 gene and for presence/absence of a point mutation (Y136F) known to result in higher fungicide tolerance. We show that an increase in EnCYP51 copy number is significantly more likely to be detected in isolates collected from fungicide-treated vineyards. Increased EnCYP51 copy numbers were detected with the Y136F allele, suggesting that an increase in copy number becomes advantageous only after the fungicide-tolerant allele is acquired. We also show that EnCYP51 copy number influences expression in a gene-dose dependent manner and correlates with fungal growth in the presence of a DMI fungicide.ConclusionsTaken together our results show that CNV can be adaptive in the development of resistance to fungicides by providing increasing quantitative protection in a gene-dosage dependent manner. The results of this work not only demonstrate the effectiveness of using genomics to dissect complex traits in organisms with very limited molecular information, but also may have broader implications for understanding genomic dynamics in response to strong selective pressure in other pathogens with similar genome architectures

    Computational pan-genomics: status, promises and challenges

    Get PDF
    International audienceMany disciplines, from human genetics and oncology to plant breeding, microbiology and virology, commonly face the challenge of analyzing rapidly increasing numbers of genomes. In case of Homo sapiens, the number of sequenced genomes will approach hundreds of thousands in the next few years. Simply scaling up established bioinformatics pipelines will not be sufficient for leveraging the full potential of such rich genomic data sets. Instead, novel, qualitatively different computational methods and paradigms are needed. We will witness the rapid extension of computational pan-genomics, a new sub-area of research in computational biology. In this article, we generalize existing definitions and understand a pan-genome as any collection of genomic sequences to be analyzed jointly or to be used as a reference. We examine already available approaches to construct and use pan-genomes, discuss the potential benefits of future technologies and methodologies and review open challenges from the vantage point of the above-mentioned biological disciplines. As a prominent example for a computational paradigm shift, we particularly highlight the transition from the representation of reference genomes as strings to representations as graphs. We outline how this and other challenges from different application domains translate into common computational problems, point out relevant bioinformatics techniques and identify open problems in computer science. With this review, we aim to increase awareness that a joint approach to computational pan-genomics can help address many of the problems currently faced in various domains

    Review of state-of-the-art algorithms for genomics data analysis pipelines

    Get PDF
    [EN]The advent of big data and advanced genomic sequencing technologies has presented challenges in terms of data processing for clinical use. The complexity of detecting and interpreting genetic variants, coupled with the vast array of tools and algorithms and the heavy computational workload, has made the development of comprehensive genomic analysis platforms crucial to enabling clinicians to quickly provide patients with genetic results. This chapter reviews and describes the pipeline for analyzing massive genomic data using both short-read and long-read technologies, discussing the current state of the main tools used at each stage and the role of artificial intelligence in their development. It also introduces DeepNGS (deepngs.eu), an end-to-end genomic analysis web platform, including its key features and applications

    InPhaDel: integrative shotgun and proximity-ligation sequencing to phase deletions with single nucleotide polymorphisms.

    Get PDF
    Phasing of single nucleotide (SNV), and structural variations into chromosome-wide haplotypes in humans has been challenging, and required either trio sequencing or restricting phasing to population-based haplotypes. Selvaraj et al demonstrated single individual SNV phasing is possible with proximity ligated (HiC) sequencing. Here, we demonstrate HiC can phase structural variants into phased scaffolds of SNVs. Since HiC data is noisy, and SV calling is challenging, we applied a range of supervised classification techniques, including Support Vector Machines and Random Forest, to phase deletions. Our approach was demonstrated on deletion calls and phasings on the NA12878 human genome. We used three NA12878 chromosomes and simulated chromosomes to train model parameters. The remaining NA12878 chromosomes withheld from training were used to evaluate phasing accuracy. Random Forest had the highest accuracy and correctly phased 86% of the deletions with allele-specific read evidence. Allele-specific read evidence was found for 76% of the deletions. HiC provides significant read evidence for accurately phasing 33% of the deletions. Also, eight of eight top ranked deletions phased by only HiC were validated using long range polymerase chain reaction and Sanger. Thus, deletions from a single individual can be accurately phased using a combination of shotgun and proximity ligation sequencing. InPhaDel software is available at: http://l337x911.github.io/inphadel/

    Genome dynamics of the human embryonic kidney 293 lineage in response to cell biology manipulations

    Get PDF
    The HEK293 human cell lineage is widely used in cell biology and biotechnology. Here we use whole-genome resequencing of six 293 cell lines to study the dynamics of this aneuploid genome in response to the manipulations used to generate common 293 cell derivatives, such as transformation and stable clone generation (293T); suspension growth adaptation (293S); and cytotoxic lectin selection (293SG). Remarkably, we observe that copy number alteration detection could identify the genomic region that enabled cell survival under selective conditions (i.c. ricin selection). Furthermore, we present methods to detect human/vector genome breakpoints and a user-friendly visualization tool for the 293 genome data. We also establish that the genome structure composition is in steady state for most of these cell lines when standard cell culturing conditions are used. This resource enables novel and more informed studies with 293 cells, and we will distribute the sequenced cell lines to this effect

    유전체 및 전사체 분석을 활용한 항암제(MTX) 내성 HT-29 세포주의 tandem DHFR 유전자 증폭 특성 및 기전 연구

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 의과대학 의과학과, 2019. 2. 김종일.The massively parallel sequencing technology known as next-generation sequencing (NGS) has been currently developed and evolved for cancer genome research to obtain the molecular microscope findings and treatment of disease. The time and cost for NGS analysis have been greatly reduced, so the mechanisms from the basic mechanism of human evolution to the complicated mechanism underlying how genetic changes have driven the resistance of cancer cells under anti-cancer drugs have been comprehensively investigated through advancements in NGS technologies. Therefore, the combination of these NGS technologies has contributed to cancer research such as diagnosis, management, and treatment by identifying and elucidating the molecular tumor profiling and it would play an important role in the future of cancer treatment and of personalized medicine in cancer research. DHFR gene amplification is present in methotrexate (MTX) resistant colon cancer cells and in acute lymphoblastic leukemia. The region of chromosome 5q14 contains many genes as well as DHFR gene, and little is known about DHFR gene amplification at this position since quantifying amplification size and recognizing the involved repetitive rearrangements in gene amplification position require extra time and efforts with limited technologies and bioinformatics. Also, there is no clear way to assemble the complete structure of the amplified region with short read (read length repeat length), which provide exceptionally long read lengths, have the potential to overcome these limitations and allow for complete assembly of the region. Here I have proposed an integrative framework to quantify the amplified region and detect structural variations, which are large, complex DNA segments involving repeats by using a combination of technologies, including single molecule real-times sequencing, next generation optical mapping, and high throughput chromosome conformation capture (Hi-C). The amplification units of 11 genes from DHFR gene to ATP6AP1L gene position on chromosome 5 (~2.2Mbp) and tandem gene amplification about twentyfold longer amplified region than control have been identified by several NGS technologies such as optical mapping and single molecule real-times sequencing, and its abnormally increased expression and complicated splicing patterns were characterized by RNA sequencing data. The novel inversion (chr5:80,618,750-80,631,409) at the DHFR gene of amplified region was detected which might stimulate chromosomal breakage for gene amplification Using Hi-C technology, the high adjusted interaction frequencies which indicated the inter-chromosomal contact and significant adjusted p-value were detected on the amplified unit and unsuspected position on 5q in MTX resistant HT-29 sample compared to control. It might explain that chromosomal structure from the start position of the amplified unit (80.6Mb - 82.8Mb) to end of 5q (109Mb-138Mb) could have the complex network of spatial contacts to harbor the gene amplification. Also, the increased relative copy number, the several newly identified topologically associating domains (TADs), and extrachromosomal double minutes (DMs) on this amplified region, which were not detected by other technologies, were identified and described for finding the association with the gene amplification mechanism. Interestingly, the novel frameshift insertions in most of MSH and MLH genes were identified, which could cause the dysregulation of mismatch repair pathway under MTX condition and play an important role on the rapid progression of gene amplification as well as being resistant to MTX. Considering the several characteristics of variable size of tandem gene amplification patterns with homogeneously staining chromosome regions (HSRs), extrachromosomal DM suggested that the gene amplification might be produced from the Breakage-fusion-bridge (BFB) cycles. Overall, the characterized tandem gene amplified unit, more complicated interaction on intra-chromosome 5, inversion of the amplification unit as well as the mutations in MSH and MLH genes can be the critical factor for identifying the mechanism of genomic rearrangements, and these findings may give new insight into the mechanism underlying the amplification process and evolution of resistance to drugs. Therefore, the comprehensive approach of combined advanced technologies is a powerful tool for interpretation of cancer genomes, and this will provide the depth of insight to identify the most important therapeutic mechanism and new targets of the anti-cancer drug.차세대 시퀀싱 (next generation sequencingNGS)으로 알려진 대량 병렬 시퀀싱 기술은 암 유전체 내의 질병의 분자 현미경 수준의 새로운 발견 및 치료법을 얻기 위해 개발되고 발전해 왔다. 현재 차세대 시퀀싱 분석을 위한 시간과 비용이 크게 줄어들었으며, 인간 진화의 기본 메커니즘에서 항암제 내성을 보이는 암 세포의 유전자 변형에 관련된 복잡한 메커니즘에 이르기까지 차세대 시퀀싱 분석의 발전을 통하여 종합적으로 분석되어왔다. 따라서 이러한 차세대 시퀀싱 분석 기술들의 조합은 분자 수준의 종양 프로파일을 규명하고 밝혀줌으로써 진단, 관리 및 치료를 위한 암 연구에 기여했으며, 암 치료 및 암 연구에서의 맞춤 의학의 미래에 중요한 역할을 할 것이다. DHFR 유전자 증폭 현상은 항암제 매토트렉세이트(methotrexateMTX)에 내성을 보이는 결장암 세포에 존재하며 또한 급성 림프 구성 백혈병에 존재한다. 5q14 염색체의 영역은 많은 유전자를 포함하고 있으며 대장 암 세포가 매토트렉세이트 상태에서 저항을 보일 때 유전자 증폭 현상의 근원이 되는 것으로 알려져 있으나, 실제 유전체의 변화에 대해서는 거의 알려져 있지 않았다. 이전에는 짧은 염기 서열 분석 기술을 사용해서 분석하였지만, 제공된 짧은 서열은 반복서열 영역 (repetitive region)을 분석 할 수 없고 접합 서열 (junction reads)를 식별 할 수 없기 때문에 증폭 된 영역의 전체 구조를 조합 (assemble) 할 명확한 방법이 없었다. 예외적으로 긴 서열을 제공하는 단일 분자 실시간 (PacBio SMRT) 시퀀싱은 이러한 한계를 극복하고 반복 영역의 유전체 서열의 완벽한 조립 (assembly) 을 가능하게 한다. 본 연구에서는 단일 분자 실시간 시퀀싱, 차세대 제한효소 광학 지도 (next generation optical mapping) 및 DNA의 3차원(3D) 구성을 측정하는 분석법 (high throughput chromosome conformation captureHi-C )과 같은 새로운 유전자 분석 기술을 사용하여 메토트렉세이트에 내성을 보이는 결장암 세포주(HT-29)내의 유전체 복제 과정을 파악하였고, 크고 복잡한 DNA 단편을 갖는 반복 서열의 구조적 변이(structural variations)를 검출하는 통합적인 프레임워크를 제안하였다. 단일 분자 실시간 시퀀싱과 광학 지도를 활용하여, 유전체 반복서열을 완벽하게 조립하고자 하였고, 5번 염색체의 DHFR 유전자에서 ATP6AP1L 유전자까지 2.2Mbp에 이르는 11 개의 유전자가 복제 단위이자 그 유전자들이 그 일렬 순서대로 대조군에 비해 20배 정도 길게 복제됨을 확인하였다. 또한, 유전자 발현량 및 RNA 유전자 접합 패턴(splicing pattern)을 대조군과 비교 분석한 결과, 유전체 복제 단위에서 작게는 5배에서 크게는 122배까지 비정상적인 유전자 발현량이 측정되었으며, 복잡한 RNA접합 패턴이 동반되는 것을 확인하였다. 또한, 염색체 구조를 파악하는 DNA의 3차원(3D) 구성을 측정한 분석 결과를 토대로, 염색체 내의 유전자가 얼마만큼 상호 작용을 하는가 확인하였을 때, 대조군에 비하여 몇몇의 위상 학적 연관 도메인 (topologically associating domainsTADs)이 매토트렉세이트에 내성을 지신 결장암 세포주(HT-29)의 유전자가 증폭된 영역의 중앙 및 종단점에서 새롭게 발견되었으며, 이 부분에서는 조정된 상호 작용 정도 값이 높고, 그 값이 통계학적으로 유의함(p<0.05)을 확인하였다. 더불어, 발견하기 힘든 이중극미염색체(double minute)가 발견되었다. 흥미롭게도, MSH와 MLH 유전자의 틀이동 삽입 돌연변이 (frameshift insertion)가 매토트렉세이트 (methotrexate) 조건 하에서 염기 쌍의 잘못 짝지움을 수복하는 분자기전(mismatch repair pathway)의 유전적 불안정성과 조절 장애를 일으켰으며, DHFR 유전자 위치에서 역위되어 중복된 경우(inverted duplication)으로 인해 5번 염색체 상의 DHFR 유전자 위치에서 염색체 절단(chromosome breakage)이 발생하였고, 다양한 크기의 유전자가 증폭된 균질염색부위(homogeneously staining regionHSR)가 절단융합가교환(breakage-fusion-bridge cycleBFB cycle)로 생산됨을 유추할 수 있었다. 종합적으로, 본 연구는 5번 염색체 내에서의 보다 복잡한 염색체 상호 작용 및 복제 단위 내의 역위는 유전체 재배열 (genomic rearrangement) 의 기전을 확인하는 중요한 요소가 될 수 있으며, 이러한 발견은 유전자 증폭 과정의 기초가 되는 메커니즘뿐만 아니라 암세포의 항암제 내성 원리에 대한 새로운 통찰력을 제공 할 수 있을 것이라 판단하였다. 따라서 차세대 염기 분석법과 다양한 새로운 첨단 기술을 결합한 분석법은 암 유전체의 해석을 위한 강력한 도구이며, 암 치료의 핵심적인 치료 메커니즘을 파악하여 항암제의 새로운 목표를 설정할 수 있다는 점에서 정밀의학의 발전에 큰 영향을 미칠 것으로 기대한다.Abstract i Contents vi List of Tables vii List of Figures ix List of Abbreviations xiii Introduction 1 Material and Methods 6 Results 28 Discussion 87 References 96 Abstract in Korean 106Docto
    corecore