16 research outputs found

    Optimization of High and Ultra High Molecular Weight DNA purification for Third Generation Sequencing and Optical Mapping in algae

    Get PDF
    The analysis of long DNA molecules by novel genomic technologies, such as Bionano optical mapping and Third Generation Sequencing, including PacBio Single Molecule Real Time Sequencing and Oxford Nanopore sequencing, provide the opportunity for complete genome characterization and reconstruction, allowing to identify large (balanced) structural variants, to determine the variant phasing and haplotype, to sequence full-length repeated regions and to assemble and scaffold genomes de-novo. Implementation of these technologies requires a combination of highly pure and High Molecular Weight (HMW) DNA, >10^5bp (Bionano Optical Mapping) or >10^4bp (Third Generation Sequencing) in length. However, standardized and suitable extraction methods to obtain highly pure HMW DNA are still missing for many organisms and tissues. In particular, plants and algae store a large amount of phenolic compounds, polysaccharides and a high copy number of chloroplast and mitochondrial DNA, making the extraction of both pure and HMW genomic DNA challenging. The aim of this work was the optimization of methods for the purification of highly pure and (Ultra)HMW DNA from a microalgae selected as case study, Haematococcus pluvialis (H.pluvialis), suitable for Third Generation sequencing and Bionano optical mapping. Despite H.pluvialis is unicellular green microalgae extensively studied for industrial applications, a high quality genome for its biotechnological application is still missing. Therefore, an extensive benchmarking of DNA and nuclei isolation methods was conducted to produce high-quality HMW DNA suitable to generate Third Generation sequencing and Bionano optical mapping data for the reconstruction of its genome de-novo. 4 (U)HMW DNA extraction methods and 8 nuclei isolation methods and 4 post-extraction DNA purification methods were evaluated independently or in combination. To further improve DNA purity and optimize the production of high-quality sequencing data, 4 post-extraction DNA purification methods were also tested. The methods were compared in terms of yield, length and purity of extracted DNA and its analysis by Third Generation sequencing and optical mapping. Only 3 specific combinations of these protocols yielded suitable DNA to generate successful results with PacBio (CTAB buffer+AMPureXP beads purification), Oxford Nanopore (MEB buffer+G-tip- DNA based extraction) and Bionano (MEB buffer+plug- DNA based extraction). The data produced herein can be used to obtain a highly contiguous genome for H.pluvialis with the efficient reconstruction of repetitive genomic portions (highly present in H.pluvialis genome), by eliminating ambiguity in the positions or size of genomic elements

    Identification of genetic risk factors for Parkinson’s disease

    Get PDF
    Parkinson's disease (PD) is a common progressive neurodegenerative disorder with a complex and heterogeneous genetic landscape. Approximately 90% of all PD cases are driven by the cumulative effect of several common low-risk genetic variants. Over the last years, genetic studies of familial and sporadic PD cases identified a range of high and low-risk variants, representing approximately 40% of estimated heritability. However, the role of structural variants (SV) in the PD missing heritability remains understudied. Therefore, we investigated SVs in the human cohort enriched for the PD phenotype to expand our knowledge about the putative PD genetic risk factors. We leveraged the matching omics datasets obtained from 95 iPSC lines differentiated into the dopaminergic neuronal-like state to run the SV calling and to directly assess their impact on the gene and transcript expression. We demonstrated a conceptual approach for the genome-wide SV annotation and pathogenicity assessment, addressing the challenges of functional SV effect prediction based on the known properties of genome regions and available multi-omics data. Using this approach, we prioritized a group of non-coding SVs absent in the healthy controls with a strong association with the differential expression of genes whose dysregulation can trigger the development of PD or PD-related phenotype. Discovered variation impacts molecular mechanisms involved in the regulation of signaling processes, oxidative stress response, and neuronal DNA reparation. Additional analysis on the larger PD patient and control cohort has to be conducted for variant-expression association validation and exploration of the allele effect size and penetrance of the prioritized hits. The dataset is publicly available to facilitate the further discovery of SV PD risk association as well as to study sequence signatures and neurological disease-specific SV hot spots

    Bioinformatic Pipeline for Determining Terminal Repeats in the Human Cytomegalovirus Genome Assembled with PacBio Long Read Sequences

    Get PDF
    Human Cytomegalovirus (HCMV) is a member of the betaherpesvirinae subfamily of the Herpesvirus family. HCMV infection is common among adults worldwide, with an estimated seroprevalence of 66 to 95%, depending on the geographic region (Zuhair et al., 2019). Although most of the virus genomic content has been studied extensively, the terminal repeating region sequences remain understudied. Two main challenges hindered the study of the region: a) limitations of sequencing technologies; and b) misassembly of the repeats due to its complex nature. Here I show a novel bioinformatics pipeline that takes advantage of PacBio\u27s long reads to resolve the challenges mentioned earlier. Implementation of the pipeline yielded results that supported previous assumptions of the terminal region, showed evidence of new findings, and provided in-depth analysis of the terminal repeat known as the a sequence

    Construction of Red Fox Chromosomal Fragments from the Short-Read Genome Assembly

    Get PDF
    The genome of a red fox (Vulpes vulpes) was recently sequenced and assembled using next-generation sequencing (NGS). The assembly is of high quality, with 94X coverage and a scaffold N50 of 11.8 Mbp, but is split into 676,878 scaffolds, some of which are likely to contain assembly errors. Fragmentation and misassembly hinder accurate gene prediction and downstream analysis such as the identification of loci under selection. Therefore, assembly of the genome into chromosome-scale fragments was an important step towards developing this genomic model. Scaffolds from the assembly were aligned to the dog reference genome and compared to the alignment of an outgroup genome (cat) against the dog to identify syntenic sequences among species. The program Reference-Assisted Chromosome Assembly (RACA) then integrated the comparative alignment with the mapping of the raw sequencing reads generated during assembly against the fox scaffolds. The 128 sequence fragments RACA assembled were compared to the fox meiotic linkage map to guide the construction of 40 chromosomal fragments. This computational approach to assembly was facilitated by prior research in comparative mammalian genomics, and the continued improvement of the red fox genome can in turn offer insight into canid and carnivore chromosome evolution. This assembly is also necessary for advancing genetic research in foxes and other canids

    Analysing Microbial Communities

    Get PDF
    Anaerobic digestion, the decomposition of organic matter to biogas and digestate in the absence of oxygen, is carried out by diverse communities of microorganisms. Until recently, 16S rRNA gene amplification has been the main focus towards better understanding of these communities, ultimately for their exploitation in industry and waste management. Metagenomics and shotgun whole genome sequencing now offers a different approach, allowing for the functional analysis of individual members of the community without the need for cell culturing. But metagenomics is not without its own pitfalls. Currently there are limited tools and methods available for use with large and complex datasets from sequencing of anaerobic digestion communities. Here we present the development of a rapid fully automated software pipeline for the large-scale identification and functional analysis of quality genomes extracted from anaerobic digestion metagenomic datasets. The pipeline consists of two new tools for the analysis of metagenomic data; the MCCR tool for reducing contamination in proposed genomes formed from metagenomic data, and the MPP tool for simultaneously predicting metabolic pathways across the large numbers of organisms found in metagenomes. The tools and pipeline were tested on both synthetic and real datasets during their development, and while further development will be needed in the future, this pipeline shows high potential to be both viable and extremely useful in understanding complex metagenomic datasets

    Evolutionary genomics : statistical and computational methods

    Get PDF
    This open access book addresses the challenge of analyzing and understanding the evolutionary dynamics of complex biological systems at the genomic level, and elaborates on some promising strategies that would bring us closer to uncovering of the vital relationships between genotype and phenotype. After a few educational primers, the book continues with sections on sequence homology and alignment, phylogenetic methods to study genome evolution, methodologies for evaluating selective pressures on genomic sequences as well as genomic evolution in light of protein domain architecture and transposable elements, population genomics and other omics, and discussions of current bottlenecks in handling and analyzing genomic data. Written for the highly successful Methods in Molecular Biology series, chapters include the kind of detail and expert implementation advice that lead to the best results. Authoritative and comprehensive, Evolutionary Genomics: Statistical and Computational Methods, Second Edition aims to serve both novices in biology with strong statistics and computational skills, and molecular biologists with a good grasp of standard mathematical concepts, in moving this important field of study forward

    Legume Genetics and Biology

    Get PDF
    Legumes have played an important part as human food and animal feed in cropping systems since the dawn of agriculture. The legume family is arguably one of the most abundantly domesticated crop plant families. Their ability to symbiotically fix nitrogen and improve soil fertility has been rewarded since antiquity and makes them a key protein source. Pea was the original model organism used in Mendel´s discovery of the laws of inheritance, making it the foundation of modern plant genetics. This book based on Special Issue provides up-to-date information on legume biology, genetic advances, and the legacy of Mendel

    Evolutionary Genomics

    Get PDF
    This open access book addresses the challenge of analyzing and understanding the evolutionary dynamics of complex biological systems at the genomic level, and elaborates on some promising strategies that would bring us closer to uncovering of the vital relationships between genotype and phenotype. After a few educational primers, the book continues with sections on sequence homology and alignment, phylogenetic methods to study genome evolution, methodologies for evaluating selective pressures on genomic sequences as well as genomic evolution in light of protein domain architecture and transposable elements, population genomics and other omics, and discussions of current bottlenecks in handling and analyzing genomic data. Written for the highly successful Methods in Molecular Biology series, chapters include the kind of detail and expert implementation advice that lead to the best results. Authoritative and comprehensive, Evolutionary Genomics: Statistical and Computational Methods, Second Edition aims to serve both novices in biology with strong statistics and computational skills, and molecular biologists with a good grasp of standard mathematical concepts, in moving this important field of study forward
    corecore