96 research outputs found

    Processing and analyzing multiple genomes alignments with MafFilter

    Get PDF
    As the number of available genome sequences from both closely related species and individuals withinspecies increased, theoretical and methodological convergences between the fields of phylogenomics andpopulation genomics emerged. Population genomics typically focuses on the analysis of variants, whilephylogenomics heavily relies on genome alignments. However, these are playing an increasingly importantrole in studies at the population level. Multiple genome alignments of individuals are used when structuralvariation is of primary interest and when genome architecture permits to assemblede novogenomesequences. Here I describe MafFilter, a command-line-driven program allowing to process genome align-ments in the Multiple Alignment Format (MAF). Using concrete examples based on publicly availabledatasets, I demonstrate how MafFilter can be used to develop efficient and reproducible pipelines withquality assurance for downstream analyses. I further show how MafFilter can be used to perform both basicand advanced population genomic analyses in order to infer the patterns of nucleotide diversity alonggenomes

    Mapping and phasing of structural variation in patient genomes using nanopore sequencing

    Get PDF
    Despite improvements in genomics technology, the detection of structural variants (SVs) from short-read sequencing still poses challenges, particularly for complex variation. Here we analyse the genomes of two patients with congenital abnormalities using the MinION nanopore sequencer and a novel computational pipeline—NanoSV. We demonstrate that nanopore long reads are superior to short reads with regard to detection of de novo chromothripsis rearrangements. The long reads also enable efficient phasing of genetic variations, which we leveraged to determine the parental origin of all de novo chromothripsis breakpoints and to resolve the structure of these complex rearrangements. Additionally, genome-wide surveillance of inherited SVs reveals novel variants, missed in short-read data sets, a large proportion of which are retrotransposon insertions. We provide a first exploration of patient genome sequencing with a nanopore sequencer and demonstrate the value of long-read sequencing in mapping and phasing of SVs for both clinical and research applications

    Genomic analysis of Sparus aurata reveals the evolutionary dynamics of sex-biased genes in a sequential hermaphrodite fish

    Get PDF
    Sexual dimorphism is a fascinating subject in evolutionary biology and mostly results from sex-biased expression of genes, which have been shown to evolve faster in gonochoristic species. We report here genome and sex-specific transcriptome sequencing of Sparus aurata, a sequential hermaphrodite fish. Evolutionary comparative analysis reveals that sex-biased genes in S. aurata are similar in number and function, but evolved following strikingly divergent patterns compared with gonochoristic species, showing overall slower rates because of stronger functional constraints. Fast evolution is observed only for highly ovary-biased genes due to female-specific patterns of selection that are related to the peculiar reproduction mode of S. aurata, first maturing as male, then as female. To our knowledge, these findings represent the first genome-wide analysis on sex-biased loci in a hermaphrodite vertebrate species, demonstrating how having two sexes in the same individual profoundly affects the fate of a large set of evolutionarily relevant genes.European Union KBBE.2013.1.2-10 European Community 311920 Fondazione Cassa di Risparmio Padova e Rovigo FCT - Foundation for Science and Technology research grant SPARCOMP under the Call ARISTEIA I of the National Strategic Reference Framework - by the EU 36 Hellenic Republic through the European Social Fundinfo:eu-repo/semantics/publishedVersio

    Validation of Skeletal Muscle cis-Regulatory Module Predictions Reveals Nucleotide Composition Bias in Functional Enhancers

    Get PDF
    We performed a genome-wide scan for muscle-specific cis-regulatory modules (CRMs) using three computational prediction programs. Based on the predictions, 339 candidate CRMs were tested in cell culture with NIH3T3 fibroblasts and C2C12 myoblasts for capacity to direct selective reporter gene expression to differentiated C2C12 myotubes. A subset of 19 CRMs validated as functional in the assay. The rate of predictive success reveals striking limitations of computational regulatory sequence analysis methods for CRM discovery. Motif-based methods performed no better than predictions based only on sequence conservation. Analysis of the properties of the functional sequences relative to inactive sequences identifies nucleotide sequence composition can be an important characteristic to incorporate in future methods for improved predictive specificity. Muscle-related TFBSs predicted within the functional sequences display greater sequence conservation than non-TFBS flanking regions. Comparison with recent MyoD and histone modification ChIP-Seq data supports the validity of the functional regions

    Evolution of a supergene that regulates a trans-species social polymorphism

    Get PDF
    Supergenes are clusters of linked genetic loci that jointly affect the expression of complex phenotypes, such as social organization. Little is known about the origin and evolution of these intriguing genomic elements. Here we analyse whole-genome sequences of males from native populations of six fire ant species and show that variation in social organization is under the control of a novel supergene haplotype (termed Sb), which evolved by sequential incorporation of three inversions spanning half of a 'social chromosome'. Two of the inversions interrupt protein-coding genes, resulting in the increased expression of one gene and modest truncation in the primary protein structure of another. All six socially polymorphic species studied harbour the same three inversions, with the single origin of the supergene in their common ancestor inferred by phylogenomic analyses to have occurred half a million years ago. The persistence of Sb along with the ancestral SB haplotype through multiple speciation events provides a striking example of a functionally important trans-species social polymorphism presumably maintained by balancing selection. We found that while recombination between the Sb and SB haplotypes is severely restricted in all species, a low level of gene flux between the haplotypes has occurred following the appearance of the inversions, potentially mitigating the evolutionary degeneration expected at genomic regions that cannot freely recombine. These results provide a detailed picture of the structural genomic innovations involved in the formation of a supergene controlling a complex social phenotype

    Identification of Y-Box Binding Protein 1 As a Core Regulator of MEK/ERK Pathway-Dependent Gene Signatures in Colorectal Cancer Cells

    Get PDF
    Transcriptional signatures are an indispensible source of correlative information on disease-related molecular alterations on a genome-wide level. Numerous candidate genes involved in disease and in factors of predictive, as well as of prognostic, value have been deduced from such molecular portraits, e.g. in cancer. However, mechanistic insights into the regulatory principles governing global transcriptional changes are lagging behind extensive compilations of deregulated genes. To identify regulators of transcriptome alterations, we used an integrated approach combining transcriptional profiling of colorectal cancer cell lines treated with inhibitors targeting the receptor tyrosine kinase (RTK)/RAS/mitogen-activated protein kinase pathway, computational prediction of regulatory elements in promoters of co-regulated genes, chromatin-based and functional cellular assays. We identified commonly co-regulated, proliferation-associated target genes that respond to the MAPK pathway. We recognized E2F and NFY transcription factor binding sites as prevalent motifs in those pathway-responsive genes and confirmed the predicted regulatory role of Y-box binding protein 1 (YBX1) by reporter gene, gel shift, and chromatin immunoprecipitation assays. We also validated the MAPK-dependent gene signature in colorectal cancers and provided evidence for the association of YBX1 with poor prognosis in colorectal cancer patients. This suggests that MEK/ERK-dependent, YBX1-regulated target genes are involved in executing malignant properties

    Sequencing and de novo assembly of 150 genomes from Denmark as a population reference

    Get PDF
    Hundreds of thousands of human genomes are now being sequenced to characterize genetic variation and use this information to augment association mapping studies of complex disorders and other phenotypic traits. Genetic variation is identified mainly by mapping short reads to the reference genome or by performing local assembly. However, these approaches are biased against discovery of structural variants and variation in the more complex parts of the genome. Hence, large-scale de novo assembly is needed. Here we show that it is possible to construct excellent de novo assemblies from high-coverage sequencing with mate-pair libraries extending up to 20 kilobases. We report de novo assemblies of 150 individuals (50 trios) from the GenomeDenmark project. The quality of these assemblies is similar to those obtained using the more expensive long-read technology. We use the assemblies to identify a rich set of structural variants including many novel insertions and demonstrate how this variant catalogue enables further deciphering of known association mapping signals. We leverage the assemblies to provide 100 completely resolved major histocompatibility complex haplotypes and to resolve major parts of the Y chromosome. Our study provides a regional reference genome that we expect will improve the power of future association mapping studies and hence pave the way for precision medicine initiatives, which now are being launched in many countries including Denmark

    A heterozygous moth genome provides insights into herbivory and detoxification

    Get PDF
    How an insect evolves to become a successful herbivore is of profound biological and practical importance. Herbivores are often adapted to feed on a specific group of evolutionarily and biochemically related host plants1, but the genetic and molecular bases for adaptation to plant defense compounds remain poorly understood2. We report the first whole-genome sequence of a basal lepidopteran species, Plutella xylostella, which contains 18,071 protein-coding and 1,412 unique genes with an expansion of gene families associated with perception and the detoxification of plant defense compounds. A recent expansion of retrotransposons near detoxification-related genes and a wider system used in the metabolism of plant defense compounds are shown to also be involved in the development of insecticide resistance. This work shows the genetic and molecular bases for the evolutionary success of this worldwide herbivore and offers wider insights into insect adaptation to plant feeding, as well as opening avenues for more sustainable pest management.Minsheng You … Simon W Baxter … et al

    Accurate detection of complex structural variations using single-molecule sequencing

    Get PDF
    Structural variations are the greatest source of genetic variation, but they remain poorly understood because of technological limitations. Single-molecule long-read sequencing has the potential to dramatically advance the field, although high error rates are a challenge with existing methods. Addressing this need, we introduce open-source methods for long-read alignment (NGMLR; https://github.com/philres/ngmlr ) and structural variant identification (Sniffles; https://github.com/fritzsedlazeck/Sniffles ) that provide unprecedented sensitivity and precision for variant detection, even in repeat-rich regions and for complex nested events that can have substantial effects on human health. In several long-read datasets, including healthy and cancerous human genomes, we discovered thousands of novel variants and categorized systematic errors in short-read approaches. NGMLR and Sniffles can automatically filter false events and operate on low-coverage data, thereby reducing the high costs that have hindered the application of long reads in clinical and research settings
    corecore