151 research outputs found

    Performance of methods to detect genetic variants from bisulphite sequencing data in a non-model species

    Get PDF
    The profiling of epigenetic marks like DNA methylation has become a central aspect of studies in evolution and ecology. Bisulphite sequencing is commonly used for assessing genome-wide DNA methylation at single nucleotide resolution but these data can also provide information on genetic variants like single nucleotide polymorphisms (SNPs). However, bisulphite conversion causes unmethylated cytosines to appear as thymines, complicating the alignment and subsequent SNP calling. Several tools have been developed to overcome this challenge, but there is no independent evaluation of such tools for non-model species, which often lack genomic references. Here, we used whole-genome bisulphite sequencing (WGBS) data from four female great tits (Parus major) to evaluate the performance of seven tools for SNP calling from bisulphite sequencing data. We used SNPs from whole-genome resequencing data of the same samples as baseline SNPs to assess common performance metrics like sensitivity, precision, and the number of true positive, false positive, and false negative SNPs for the full range of variant and genotype quality values. We found clear differences between the tools in either optimizing precision (Bis-SNP), sensitivity (biscuit), or a compromise between both (all other tools). Overall, the choice of SNP caller strongly depends on which performance parameter should be maximized and whether ascertainment bias should be minimized to optimize downstream analysis, highlighting the need for studies that assess such differences.Peer reviewe

    RUbioSeq+: A multiplatform application that executes parallelized pipelines to analyse next-generation sequencing data

    Full text link
    This is the peer reviewed version of the following article: Computer Methods and Programs in Biomedine 138 (2016): 73-81, which has been published in final form at http://dx.doi.org/10.1016/j.cmpb.2016.10.008Background and objective To facilitate routine analysis and to improve the reproducibility of the results, next-generation sequencing (NGS) analysis requires intuitive, efficient and integrated data processing pipelines. Methods We have selected well-established software to construct a suite of automated and parallelized workflows to analyse NGS data for DNA-seq (single-nucleotide variants (SNVs) and indels), CNA-seq, bisulfite-seq and ChIP-seq experiments. Results Here, we present RUbioSeq+, an updated and extended version of RUbioSeq, a multiplatform application that incorporates a suite of automated and parallelized workflows to analyse NGS data. This new version includes: (i) an interactive graphical user interface (GUI) that facilitates its use by both biomedical researchers and bioinformaticians, (ii) a new pipeline for ChIP-seq experiments, (iii) pair-wise comparisons (case–control analyses) for DNA-seq experiments, (iv) and improvements in the parallelized and multithreaded execution options. Results generated by our software have been experimentally validated and accepted for publication. Conclusions RUbioSeq+ is free and open to all users at http://rubioseq.bioinfo.cnio.es/.M.R-C is funded by the BLUEPRINT Consortium (FP7/ 2007-2013) under grant agreement number 282510. J.M.F is funded by the INB Node 2 - CNIO, a member of Proteored - PRB2-ISCIII and is supported by grant PT13/0001, of the PE I+D+i 2013-2016, funded by ISCIII and FEDER. H.L-F is funded by a postdoctoral fellowship from the Xunta de Galicia. F.F-R and D.G-P are funded by the European Union's Seventh Framework Programme FP7/REGPOT 2012 2013.1 under grant agreement n° 316265 (BIOCAPS) and the "Platform of integration of intelligent techniques for analysis of biomedical information" project (TIN2013-47153-C3-3-R) financed by the Spanish Ministry of Economy and Competitiveness C.FT is funded by the "Spanish National Youth Guarantee Implementation Plan” (2013/2016) financed by the Spanish Ministry of Economy and Competitivenes

    De novo mutations in SMCHD1 cause Bosma arhinia microphthalmia syndrome and abrogate nasal development

    Get PDF
    Bosma arhinia microphthalmia syndrome (BAMS) is an extremely rare and striking condition characterized by complete absence of the nose with or without ocular defects. We report here that missense mutations in the epigenetic regulator SMCHD1 mapping to the extended ATPase domain of the encoded protein cause BAMS in all 14 cases studied. All mutations were de novo where parental DNA was available. Biochemical tests and in vivo assays in Xenopus laevis embryos suggest that these mutations may behave as gain-of-function alleles. This finding is in contrast to the loss-of-function mutations in SMCHD1 that have been associated with facioscapulohumeral muscular dystrophy (FSHD) type 2. Our results establish SMCHD1 as a key player in nasal development and provide biochemical insight into its enzymatic function that may be exploited for development of therapeutics for FSHD

    X-선 조사 C3H10T1/2 세포주에서 단계별 형질전환의 유전체 및 전사체 연구

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 의과대학 의과학과, 2019. 2. 김종일.Tumourigenesis is initiated by various factors and its progression is also affected by numerous elements. In 1980, Little and Kennedy performed malignant cell transformation experiment using x-irradiation to study the initiation and progression of tumourigenesis. Reproducing tumourigenesis in vitro is challenging, yet, they accomplished the task and almost 40 years later, I am here to elucidate unsolved questions beneath that experiment. Firstly, genomic alterations were investigated using whole-genome sequencing. Although malignant transformation process is a recreation of tumourigenesis in vitro, the genomic alterations were not as striking as I expected which was very intriguing. The generation of focus definitely implied the presence of DNA damage, yet the changes occurred at this stage were not quite evolved. I then studied transcriptome changes that arose due to x-irradiation. Using principal component analysis and admixture analysis, I illustrated that the transcriptome profiles were distinctively differed according to x-irradiation or focus generation status. To investigate what drives focus generation, I analysed differentially expressed genes between un-irradiated cells and irradiated but non-focus cells. Non-focus cells seemed to be at the initial stage of malignant transformation as its characteristics were closer to focus rather than un-irradiated cells. Particularly, non-focus cells exhibited highly elevated Cdkn1a, a DNA damage response gene, possibly to respond to DNA damage by x-irradiation. Also, down-regulation of TGF-b genes were observed in non-focus cells which may be one of the factors that induce focus generation. Moreover, DNA repair related genes including Atm, Atr, Brca1, Brca2, and Chek1 were highly elevated in focus cells. This study described that alterations due to x-irradiation in transcriptome were quite dramatic whereas changes in genome was rather gentle. Furthermore, as stem cells are frequently involved in malignant transformation, whether Focus possesses stem cell-like characteristics was examined. As a result, Focus displayed stemness and up-regulated Myc contributed to oncogenic reprogramming hence generated Focus cells with tumourigenic characteristics. In conclusion, I speculated that there are three factors involved in the second step of two-step malignant transformation. The primary factor would be down-regulated Tgfb gene expression. The change occurred in Non-focus, which is considered as at the very early stage of tumourigenesis. Decreased Tgf gene expression led the cells to transform further, hence induced DNA repair process especially error-prone DNA repair which would act as the secondary factor. Lastly, with up-regulated Myc oncogene, the cells were reprogrammed into cancer stem cell, hence generated foci with malignancy. Also, Focus cells exhibited stem cell-like characteristics. Taken together, in this study, I uncovered three factors that contributed to the second step of two-step malignant transformation and I believe that these findings will extend the understanding of tumourigenesis.종양의 발생과 진화는 다양한 종류의 변이에 의해 활성화 된다. 1980년, Little과 Kennedy는 종양으로 되는 단계인 세포의 형질전환을 초래하는 요인들을 밝히고자 C3H10T1/2 세포주에서 엑스레이를 이용하여 실험을 진행하였다. Little은 two-step 이론을 제시하며 세포에서 x-선 조사로 인해 유도되는 세포의 형질전환은 두 종류의 인자에 의해 일어난다고 제안하였다. 두 종류의 인자 중 첫 번째 인자는 x-선 조사였으나 두 번째 인자에 대해서는 자세하게 밝혀내지 못하였다. 40년의 세월이 지난 오늘날, 본 연구에서는 차세대 염기서열 분석 기법을 이용하여 1980년대에 미처 밝히지 못했던 형질전환을 초래하는 유전자군에 대해 밝히고자 하였다. 먼저, 전장 유전체 분석 기법을 이용하여 x-선 조사로 유도된 focus 형성과 유전형질의 변환과의 관계를 알아보고자 하였다. X-선이 조사 되었으나 focus를 형성하지 않은 세포 (Non-focus) 와 focus가 형성된 세포 (Focus) 의 유전체 복제수 변이, 돌연변이 발생 원인에 따른 고유한 특징 및 종양변이부담을 분석하였다. 그 결과 focus를 형성하더라도 x-선이 조사된 focus를 형성하지 않은 세포에 비해 유전체의 변화는 크게 없는 것이 관찰되었으며 x-선 조사가 유전체 변이에 큰 영향은 주지 않는 것을 알 수 있었다. 그리하여 본 연구에서는 전사체 분석 기법을 이용하여 x-선 조사로 유도된 focus 형성에 관여하는 유전자 발현 패턴 변화를 연구해보고자 하였다. 주성분 분석과 admixture 분석 결과는 focus 형성 여부에 따라 전사체 패턴이 달라짐을 보여주었으며 focus 형성을 유도하는 인자를 차별발현 유전자 분석을 이용하여 규명하고자 하였다. X-선이 조사되지 않은 컨트롤군과 Non-focus군의 유전자 발현량과 패턴의 차이가 대조되었으며 그 결과, Non-focus군은 엑스레이로 유도된 형질변형 과정의 가장 초기단계에 있다는 것을 설명할 수 있었다. 또한, Non-focus군의 전사체 기질이 컨트롤군 보다는 Focus군에 더 가깝다는 것도 밝혀졌다. 특히, Non-focus군에서는 DNA 손상반응 유전자로 잘 알려진 Cdkn1a가 엑스레이로 인한 DNA 손상에 대한 반응 작용으로 발현량이 매우 높게 올라간 것을 관찰할 수 있었다. 그리고 focus 형성에 기여하는 인자로 생각되는 TGF-b 유전자들이 Non-focus군에서 발현이 떨어진 것을 관찰할 수 있었다. 게다가 DNA 회복 시스템 관련 유전자로 알려진 Atm, Atr, Brca1, Brca2, 그리고 Chek1 유전자들의 발현이 Focus군에서 매우 높게 올라간 것을 알 수 있었다. DNA 회복 시스템이 Focus군에서 전반적으로 많이 올라갔지만 특히, error-prone DNA 회복 시스템이 Focus군에서 높게 올라간 것을 볼 수 있었다. 세포가 외부 자극에 의해 손상을 입었을 경우, error-free 시스템이 먼저 작동을 하고 error-free 시스템이 과부하가 되었을 경우 error-prone 시스템이 가동이 된다. 본 연구 결과에서도 error-prone 시스템이 올라간 것을 관찰할 수 있었는데 error-prone 시스템에 의해 손상된 DNA가 회복이 되어 결국 제대로 회복이 되지 않고 마침내 focus를 형성하는 것으로 생각이 된다. Focus 형성에 따른 유전체 변화는 뚜렷하지 않았으나 전사체의 변화는 매우 극명하게 나타나는 것을 유전자 발현 패턴 변화 연구를 통해 알 수 있었다. 이는 암 유전체 진화적인 관점에서 볼 때, 유전체의 변화에 앞서 전사체의 변화가 1차적으로 먼저 일어나는 경우도 있다는 것을 시사한다. 마지막으로 선행연구 중 종양 발생과 줄기세포가 밀접한 연관이 있다는 보고가 있었다. 이를 바탕으로 Focus가 stem cell-like 성질이 있는지를 관찰하였고 그 결과, Focus는 stemness가 있는 것을 확인하였다. 또한 up-regulated된 Myc 유전자가 세포를 reprogramming 할 수 있다는 보고가 있었으며 Myc으로 유도된 종양은 유전체의 변화가 크게 없다고 하였다. Focus는 Myc의 발현이 올라가 있었으며 유전체의 변화 역시 크지 않았음으로 그 결과가 선행연구와 일치한다. 본 연구의 결과를 종합해보면 Little이 제시한 two-step 형질전환 이론에서 두 번째 단계에 관여하는 인자는 두 가지가 있는데 첫 번째로는 Tgfb 유전자의 발현이 떨어지는 것이고 두 번째로는 올라간 error-prone DNA 회복 시스템이다. 아주 이른 시기의 종양 세포에서처럼 Non-focus군에서 Tgfb 유전자의 발현이 떨어짐으로써 Non-focus 세포가 Focus 세포로 형질전환이 일어나는 것에 기여를 하였고 error-prone DNA 회복 시스템이 가동됨에 의해 제대로 복구 되지 않은 세포들이 누적이 되어 결국 focus를 형성하였다. 그리고 마지막으로 Focus가 갖고 있는 stem cell-like 성질과 Myc으로 유도된 oncogenic reprogramming으로 인해 transformed cell인 Focus가 malignancy를 갖게 된다고 생각할 수 있다. 따라서 Little의 two-step 형질전환 이론을 정확히 밝혀냄으로써 본 연구를 통해 종양 발생에 대한 이해도가 높아질 것으로 예상된다.Abstract ……………………………………………………………... i Contents ……………………………………………………………. iv List of Tables ………………………………………………………... v List of Figures …………………………………………………...…. vi List of Abbreviations ………………………………………………. ix Introduction ………………………………………………………… 1 Materials and Methods …………………………………………….. 9 Results ……………………………………………………………... 14 Discussion …………………………………………………………. 70 References …………………………………………………………. 77 Abstract in Korean ……………………………………………….. 91Docto

    Variant calling:Considerations, practices, and developments

    Get PDF
    The success of many clinical, association, or population genetics studies critically relies on properly performed variant calling step. The variety of modern genomics protocols, techniques, and platforms makes our choices of methods and algorithms difficult and there is no "one size fits all" solution for study design and data analysis. In this review, we discuss considerations that need to be taken into account while designing the study and preparing for the experiments. We outline the variety of variant types that can be detected using sequencing approaches and highlight some specific requirements and basic principles of their detection. Finally, we cover interesting developments that enable variant calling for a broad range of applications in the genomics field. We conclude by discussing technological and algorithmic advances that have the potential to change the ways of calling DNA variants in the nearest future

    Bioinformatics from genetic variants to methylation

    Get PDF
    An important research topic in bioinformatics is the analysis of DNA, the molecule that encodes the genetic information of all organisms. The basis for this is sequencing, a procedure in which the sequence of DNA bases is determined. In addition to the identification of variations in the base sequence itself, advances in sequencing methods and a steady reduction in sequencing costs open up new fields of research: the analysis of functionally relevant non-base-related changes, so-called epigenetics. An important example of such a mechanism is DNA methylation, a process in which methyl groups are added to DNA without altering the sequence itself. Methylation takes place only at specific sites, and the methylation information of human DNA consists of approximately 30 million methylation levels between 0 and 1 in total. This thesis deals with problems and solutions for each phase of DNA methylation analysis. The most advanced method for detecting DNA methylation based on resolution is Whole-Genome Bisulfite Sequencing (WGBS), a technique that modifies DNA at unmethylated sites. We describe the special in-silico treatment required to process this altered DNA and existing concepts as well as newly developed bioinformatic methods for efficient determination of DNA methylation levels and their further processing with our developed tool camel. A common downstream analysis step is the detection of differentially methylated regions (DMRs), for which we have implemented a modification of the widely used method BSmooth in order to deal with today’s common data sizes. Setting up and creating new sequencing protocols, e.g., the mentioned WGBS, is complicated and requires adjustments to several parameters. We have developed a method based on a linear program (LP) that can predict the duplicate rate of supersamples. This critical quality measure represents the proportion of redundant data that in most cases needs to be removed from any further analysis. By using our method, it becomes possible to test, adjust and improve parameters for small test libraries only and to estimate the duplication rate for potential full-size samples. Once the sequencing protocol has been established, the methylation recognition of camel can be used as part of automated workflows, such as our mosquito workflow. This pipeline processes the generated WGBS samples from the raw data to the degree of methylation, including all essential intermediate steps. Such workflows are one of the central components of bioinformatics since the calculation must be parallel, reproducible and scalable. The distribution of the detected methylation levels, e.g., values of several samples at a specific location, can often be described as a beta-mixture model. The standard approach for estimating the parameters for such a model, the EM algorithm, has problems for data points of 0 or 1, which are very common as methylation levels. For this reason, we have developed an alternative algorithm based on moments that overcome this disadvantage.It is robust for data points within the closed interval [0; 1] and can also be applied to similar data sets in addition to methylation levels. This work deals not only with epigenetic but also with genetic variants. To analyze these, we present a second pipeline (ape) for data from targeted sequencing, where for example only genes are sequenced. The recognized variants then serve as input for our graphical environment eagle, a tool for computer scientists and geneticists to recognize possible causal genetic variants. As the name implies: The configuration of the analysis and presentation of the results is done via a graphical user interface. Unlike other tools, eagle is not based on databases, but on encapsulated hdf5 files. The use of this universal file-system-like data structure offers some advantages and makes the system easy to use especially for non-computer scientists. At the end of the thesis, we use all methods presented for the detection, analysis, and characterization of interindividual DMRs between several donors. This leads to some computational challenges because DMR detection is usually performed on two different groups. Our developed approach processes independent samples and calculates key metrics such as p-values and the number of undetectable DMRs. Through whole genome association studies (GWAS) on more than 1000 array data sets of methylation and variants, we show that (interindividual) DMRs as a subtype of epigenetics are related to genetic variation

    Advances in Single Molecule, Real-Time (SMRT) Sequencing

    Get PDF
    PacBio’s single-molecule real-time (SMRT) sequencing technology offers important advantages over the short-read DNA sequencing technologies that currently dominate the market. This includes exceptionally long read lengths (20 kb or more), unparalleled consensus accuracy, and the ability to sequence native, non-amplified DNA molecules. From fungi to insects to humans, long reads are now used to create highly accurate reference genomes by de novo assembly of genomic DNA and to obtain a comprehensive view of transcriptomes through the sequencing of full-length cDNAs. Besides reducing biases, sequencing native DNA also permits the direct measurement of DNA base modifications. Therefore, SMRT sequencing has become an attractive technology in many fields, such as agriculture, basic science, and medical research. The boundaries of SMRT sequencing are continuously being pushed by developments in bioinformatics and sample preparation. This book contains a collection of articles showcasing the latest developments and the breadth of applications enabled by SMRT sequencing technology

    Advancing the analysis of bisulfite sequencing data in its application to ecological plant epigenetics

    Get PDF
    The aim of this thesis is to bridge the gap between the state-of-the-art bioinformatic tools and resources, currently at the forefront of epigenetic analysis, and their emerging applications to non-model species in the context of plant ecology. New, high-resolution research tools are presented; first in a specific sense, by providing new genomic resources for a selected non-model plant species, and also in a broader sense, by developing new software pipelines to streamline the analysis of bisulfite sequencing data, in a manner which is applicable to a wide range of non-model plant species. The selected species is the annual field pennycress, Thlaspi arvense, which belongs in the same lineage of the Brassicaceae as the closely-related model species, Arabidopsis thaliana, and yet does not benefit from such extensive genomic resources. It is one of three key species in a Europe-wide initiative to understand how epigenetic mechanisms contribute to natural variation, stress responses and long-term adaptation of plants. To this end, this thesis provides a high-quality, chromosome-level assembly for T. arvense, alongside a rich complement of feature annotations of particular relevance to the study of epigenetics. The genome assembly encompasses a hybrid approach, involving both PacBio continuous long reads and circular consensus sequences, alongside Hi-C sequencing, PCR-free Illumina sequencing and genetic maps. The result is a significant improvement in contiguity over the existing draft state from earlier studies. Much of the basis for building an understanding of epigenetic mechanisms in non-model species centres around the study of DNA methylation, and in particular the analysis of bisulfite sequencing data to bring methylation patterns into nucleotide-level resolution. In order to maintain a broad level of comparison between T. arvense and the other selected species under the same initiative, a suite of software pipelines which include mapping, the quantification of methylation values, differential methylation between groups, and epigenome-wide association studies, have also been developed. Furthermore, presented herein is a novel algorithm which can facilitate accurate variant calling from bisulfite sequencing data using conventional approaches, such as FreeBayes or Genome Analysis ToolKit (GATK), which until now was feasible only with specifically-adapted software. This enables researchers to obtain high-quality genetic variants, often essential for contextualising the results of epigenetic experiments, without the need for additional sequencing libraries alongside. Each of these aspects are thoroughly benchmarked, integrated to a robust workflow management system, and adhere to the principles of FAIR (Findability, Accessibility, Interoperability and Reusability). Finally, further consideration is given to the unique difficulties presented by population-scale data, and a number of concepts and ideas are explored in order to improve the feasibility of such analyses. In summary, this thesis introduces new high-resolution tools to facilitate the analysis of epigenetic mechanisms, specifically relating to DNA methylation, in non-model plant data. In addition, thorough benchmarking standards are applied, showcasing the range of technical considerations which are of principal importance when developing new pipelines and tools for the analysis of bisulfite sequencing data. The complete “Epidiverse Toolkit” is available at https://github.com/EpiDiverse and will continue to be updated and improved in the future.:ABSTRACT ACKNOWLEDGEMENTS 1 INTRODUCTION 1.1 ABOUT THIS WORK 1.2 BIOLOGICAL BACKGROUND 1.2.1 Epigenetics in plant ecology 1.2.2 DNA methylation 1.2.3 Maintenance of 5mC patterns in plants 1.2.4 Distribution of 5mC patterns in plants 1.3 TECHNICAL BACKGROUND 1.3.1 DNA sequencing 1.3.2 The case for a high-quality genome assembly 1.3.3 Sequence alignment for NGS 1.3.4 Variant calling approaches 2 BUILDING A SUITABLE REFERENCE GENOME 2.1 INTRODUCTION 2.2 MATERIALS AND METHODS 2.2.1 Seeds for the reference genome development 2.2.2 Sample collection, library preparation, and DNA sequencing 2.2.3 Contig assembly and initial scaffolding 2.2.4 Re-scaffolding 2.2.5 Comparative genomics 2.3 RESULTS 2.3.1 An improved reference genome sequence 2.3.2 Comparative genomics 2.4 DISCUSSION 3 FEATURE ANNOTATION FOR EPIGENOMICS 3.1 INTRODUCTION 3.2 MATERIALS AND METHODS 3.2.1 Tissue preparation for RNA sequencing 3.2.2 RNA extraction and sequencing 3.2.3 Transcriptome assembly 3.2.4 Genome annotation 3.2.5 Transposable element annotations 3.2.6 Small RNA annotations 3.2.7 Expression atlas 3.2.8 DNA methylation 3.3 RESULTS 3.3.1 Transcriptome assembly 3.3.2 Protein-coding genes 3.3.3 Non-coding loci 3.3.4 Transposable elements 3.3.5 Small RNA 3.3.6 Pseudogenes 3.3.7 Gene expression atlas 3.3.8 DNA Methylation 3.4 DISCUSSION 4 BISULFITE SEQUENCING METHODS 4.1 INTRODUCTION 4.2 PRINCIPLES OF BISULFITE SEQUENCING 4.3 EXPERIMENTAL DESIGN 4.4 LIBRARY PREPARATION 4.4.1 Whole Genome Bisulfite Sequencing (WGBS) 4.4.2 Reduced Representation Bisulfite Sequencing (RRBS) 4.4.3 Target capture bisulfite sequencing 4.5 BIOINFORMATIC ANALYSIS OF BISULFITE DATA 4.5.1 Quality Control 4.5.2 Read Alignment 4.5.3 Methylation Calling 4.6 ALTERNATIVE METHODS 5 FROM READ ALIGNMENT TO DNA METHYLATION ANALYSIS 5.1 INTRODUCTION 5.2 MATERIALS AND METHODS 5.2.1 Reference species 5.2.2 Natural accessions 5.2.3 Read simulation 5.2.4 Read alignment 5.2.5 Mapping rates 5.2.6 Precision-recall 5.2.7 Coverage deviation 5.2.8 DNA methylation analysis 5.3 RESULTS 5.4 DISCUSSION 5.5 A PIPELINE FOR WGBS ANALYSIS 6 THERE AND BACK AGAIN: INFERRING GENOMIC INFORMATION 6.1 INTRODUCTION 6.1.1 Implementing a new approach 6.2 MATERIALS AND METHODS 6.2.1 Validation datasets 6.2.2 Read processing and alignment 6.2.3 Variant calling 6.2.4 Benchmarking 6.3 RESULTS 6.4 DISCUSSION 6.5 A PIPELINE FOR SNP VARIANT ANALYSIS 7 POPULATION-LEVEL EPIGENOMICS 7.1 INTRODUCTION 7.2 CHALLENGES IN POPULATION-LEVEL EPIGENOMICS 7.3 DIFFERENTIAL METHYLATION 7.3.1 A pipeline for case/control DMRs 7.3.2 A pipeline for population-level DMRs 7.4 EPIGENOME-WIDE ASSOCIATION STUDIES (EWAS) 7.4.1 A pipeline for EWAS analysis 7.5 GENOTYPING-BY-SEQUENCING (EPIGBS) 7.5.1 Extending the epiGBS pipeline 7.6 POPULATION-LEVEL HAPLOTYPES 7.6.1 Extending the EpiDiverse/SNP pipeline 8 CONCLUSION APPENDICES A. SUPPLEMENT: BUILDING A SUITABLE REFERENCE GENOME B. SUPPLEMENT: FEATURE ANNOTATION FOR EPIGENOMICS C. SUPPLEMENT: FROM READ ALIGNMENT TO DNA METHYLATION ANALYSIS D. SUPPLEMENT: INFERRING GENOMIC INFORMATION BIBLIOGRAPH

    Molecular heterogeneity of invasive penile cancer

    Get PDF
    Penile cancer is a rare and mutilating disease. Due to the paucity of basic, molecular and translational work, new treatment options have not been forthcoming and the disease has arguably been neglected, and patients have poor outcomes. This thesis explores the molecular biology of advanced squamous cell penile carcinoma by assessing its genetic and epigenetic aberrations, and transcriptomic changes. For each patient, five tumour regions were profiled in detail and compared with a matched control sample. When compared with other cancers, penile cancer appears to have a high tumour mutational load with high intra-tumour heterogeneity. Evidence for the clonal integration of HPV into the human genome was found. HPV positive samples are associated with APOBEC mutational changes and increased expression of DNMT1 and DNMT3A methyltransferases. TP53 was found to be an early clonal driver in the HPV negative samples, whereas mutations in mTOR or PIK3CA were found to be early clonal drivers in HPV positive samples. Potentially targetable mutations, such as EGFR, were only ever found to be subclonal in this small cohort. Other targetable mutations that were found to be early and shared throughout the primary tumour included DDR2 and cMET. Increased expression of immune checkpoint inhibitory proteins such as CTLA4 were found throughout all samples, providing preliminary evidence that checkpoint blockade could be effective in penile cancer. These findings suggest that penile cancer is a heterogeneous disease with remarkably different genetic and epigenetic profiles for HPV positive and HPV negative disease. These tumours display large amounts of intra-tumour heterogeneity and so may prove difficult to successfully treat with more traditional targeted therapies against tyrosine kinases. However, there is evidence that immune checkpoint blockade may prove to be efficacious in these patients and further work should be undertaken to examine this in more depth

    An ecologist's guide for studying DNA methylation variation in wild vertebrates

    Get PDF
    The field of molecular biology is advancing fast with new powerful technologies, sequencing methods and analysis software being developed constantly. Commonly used tools originally developed for research on humans and model species are now regularly used in ecological and evolutionary research. There is also a growing interest in the causes and consequences of epigenetic variation in natural populations. Studying ecological epigenetics is currently challenging, especially for vertebrate systems, because of the required technical expertise, complications with analyses and interpretation, and limitations in acquiring sufficiently high sample sizes. Importantly, neglecting the limitations of the experimental setup, technology and analyses may affect the reliability and reproducibility, and the extent to which unbiased conclusions can be drawn from these studies. Here, we provide a practical guide for researchers aiming to study DNA methylation variation in wild vertebrates. We review the technical aspects of epigenetic research, concentrating on DNA methylation using bisulfite sequencing, discuss the limitations and possible pitfalls, and how to overcome them through rigid and reproducible data analysis. This review provides a solid foundation for the proper design of epigenetic studies, a clear roadmap on the best practices for correct data analysis and a realistic view on the limitations for studying ecological epigenetics in vertebrates. This review will help researchers studying the ecological and evolutionary implications of epigenetic variation in wild populations
    corecore