284 research outputs found

    The Dfam community resource of transposable element families, sequence models, and genome annotations.

    Get PDF
    Dfam is an open access database of repetitive DNA families, sequence models, and genome annotations. The 3.0-3.3 releases of Dfam ( https://dfam.org ) represent an evolution from a proof-of-principle collection of transposable element families in model organisms into a community resource for a broad range of species, and for both curated and uncurated datasets. In addition, releases since Dfam 3.0 provide auxiliary consensus sequence models, transposable element protein alignments, and a formalized classification system to support the growing diversity of organisms represented in the resource. The latest release includes 266,740 new de novo generated transposable element families from 336 species contributed by the EBI. This expansion demonstrates the utility of many of Dfam\u27s new features and provides insight into the long term challenges ahead for improving de novo generated transposable element datasets

    An ancient retroviral RNA element hidden in mammalian genomes and its involvement in co-opted retroviral gene regulation

    Get PDF
    哺乳類のゲノムに隠された古代ウイルス --古代ウイルス特有の遺伝子制御機構の発見--. 京都大学プレスリリース. 2021-11-10.[Background] Retroviruses utilize multiple unique RNA elements to control RNA processing and translation. However, it is unclear what functional RNA elements are present in endogenous retroviruses (ERVs). Gene co-option from ERVs sometimes entails the conservation of viral cis-elements required for gene expression, which might reveal the RNA regulation in ERVs. [Results] Here, we characterized an RNA element found in ERVs consisting of three specific sequence motifs, called SPRE. The SPRE-like elements were found in different ERV families but not in any exogenous viral sequences examined. We observed more than a thousand of copies of the SPRE-like elements in several mammalian genomes; in human and marmoset genomes, they overlapped with lineage-specific ERVs. SPRE was originally found in human syncytin-1 and syncytin-2. Indeed, several mammalian syncytin genes: mac-syncytin-3 of macaque, syncytin-Ten1 of tenrec, and syncytin-Car1 of Carnivora, contained the SPRE-like elements. A reporter assay revealed that the enhancement of gene expression by SPRE depended on the reporter genes. Mutation of SPRE impaired the wild-type syncytin-2 expression while the same mutation did not affect codon-optimized syncytin-2, suggesting that SPRE activity depends on the coding sequence. [Conclusions] These results indicate multiple independent invasions of various mammalian genomes by retroviruses harboring SPRE-like elements. Functional SPRE-like elements are found in several syncytin genes derived from these retroviruses. This element may facilitate the expression of viral genes, which were suppressed due to inefficient codon frequency or repressive elements within the coding sequences. These findings provide new insights into the long-term evolution of RNA elements and molecular mechanisms of gene expression in retroviruses

    POLYA: A TOOL FOR ADJUDICATING COMPETING ANNOTATIONS OF BIOLOGICAL SEQUENCES

    Get PDF
    Annotation of a biological sequence is usually performed by aligning that sequence to a database of known sequence elements. When that database contains elements that are highly similar to each other, the proper annotation may be ambiguous, because several entries in the database produce high-scoring alignments. Typical annotation methods work by assigning a label based on the candidate annotation with the highest alignment score; this can overstate annotation certainty, mislabel boundaries, and fails to identify large scale rearrangements or insertions within the annotated sequence. Here, I present a new software tool, PolyA, that adjudicates between competing alignment-based annotations by computing estimates of annotation confidence, identifying a trace with maximal confidence, and recursively splicing/stitching inserted elements. PolyA communicates annotation certainty, identifies large scale rearrangements, and detects boundaries between neighboring elements

    A haplotype-resolved draft genome of the European sardine (Sardina pilchardus)

    Get PDF
    The European sardine (Sardina pilchardus Walbaum, 1792) is culturally and economically important throughout its distribution. Monitoring studies of sardine populations report an alarming decrease in stocks due to overfishing and environmental change, which has resulted in historically low captures along the Iberian Atlantic coast. Important biological and ecological features such as population diversity, structure, and migratory patterns can be addressed with the development and use of genomics resources.Agência financiadora Portuguese national funds from FCT-Foundation for Science and Technology: UID/Multi/04326/2016; European Regional Development Fund (FEDER): 22153-01/SAICT/2016; ALG-01-0145-FEDER-022121; ALG-01-0145-FEDER-022231; MAR2020 operational programme of the European Maritime and Fisheries Fund (project SARDI-NOMICS): MAR-01.04.02-FEAMP-0024; European Union's Horizon 2020 research and innovation programme: 654008info:eu-repo/semantics/publishedVersio

    The evolution, distribution and diversity of endogenous circoviral elements in vertebrate genomes

    Get PDF
    Circoviruses (family Circoviridae) are small, non-enveloped viruses that have short, single-stranded DNA genomes. Circovirus sequences are frequently recovered in metagenomic investigations, indicating that these viruses are widespread, yet they remain relatively poorly understood. Endogenous circoviral elements (CVe) are DNA sequences derived from circoviruses that occur in vertebrate genomes. CVe are a useful source of information about the biology and evolution of circoviruses. In this study, we screened 362 vertebrate genome assemblies in silico to generate a catalog of CVe loci. We identified a total of 179 CVe sequences, most of which have not been reported previously. We show that these CVe loci reflect at least 19 distinct germline integration events. We determine the structure of CVe loci, identifying some that show evidence of potential functionalization. We also identify orthologous copies of CVe in snakes, fish, birds, and mammals, allowing us to add new calibrations to the timeline of circovirus evolution. Finally, we observed that some ancient CVe group robustly with contemporary circoviruses in phylogenies, with all sequences within these groups being derived from the same host class or order, implying a hitherto underappreciated stability in circovirus-host relationships. The openly available dataset constructed in this investigation provides new insights into circovirus evolution, and can be used to facilitate further studies of circoviruses and CVe

    Dfam Web Server

    Get PDF
    Objective: Establish a new server for my open access web database of transposable element families (Dfam.org), migrate the service to the new server, and upgrade content of the server in preparation for a substantial collaborative R01

    SODA: an Open-Source Library for Visualizing Biological Sequence Annotation

    Get PDF
    Genome annotation is the process of identifying and labeling known genetic sequences or features within a genome. Across the various subfields within modern molecular biology, there is a common need for the visualization of such annotations. Genomic data is often visualized on web browser platforms, providing users with easy access to visualization tools without the need for installing any software or, in many cases, underlying datasets. While there exists a broad range of web-based visualization tools, there is, to my knowledge, no lightweight, modern library tailored towards the visualization of genomic data. Instead, developers charged with the task of producing a novel visualization must either adopt a complex system or fall back on general purpose visualization frameworks. Here, I present SODA, a web-based genomic annotation visualization library implemented in TypeScript as an abstraction over D3. SODA is designed to be lightweight and flexible, empowering developers with the tools to easily create customized and nuanced genomic visualizations

    Identification, comprehensive characterization, and comparative genomics of the HERV-K(HML8) integrations in the human genome

    Get PDF
    Around 8% of the human genome is composed by Human Endogenous Retroviruses (HERVs), ancient viral sequences inherited from the primate germ line after their infection by now extinct retroviruses. Given the still underexplored physiological and pathological roles of HERVs, it is fundamental to increase our information about the genomic composition of the different groups, to lay reliable foundation for functional studies. Among HERVs, the most characterized elements belong to the beta-like superfamily HERV-K, comprising 10 groups (HML1-10) with HML2 being the most recent and studied one. Among HMLs, the HML8 group is the only one still lacking a comprehensive genomic description. In the present work, we investigated HML8 sequences' distribution in the human genome (GRCh38/hg38), identifying 23 novel proviruses and characterizing the overall 78 HML8 proviruses in terms of genome structure, phylogeny, and integration pattern. HML8 elements were significantly enriched in human chromosomes 8 and X (p<0.005) while chromosomes 17 and 20 showed fewer integrations than expected (p<0.025 and p<0.005, respectively). Phylogenetic analyses classified HML8 members into 3 clusters, corresponding to the three LTR types MER11A, MER11B and MER11C. Besides different LTR types, common signatures in the internal structure suggested the potential existence of three different ancestral HML8 variants. Accordingly, time of integration estimation coupled with comparative genomics revealed that these three clusters have a different time of integration in the primates' genome, with MER11C elements being significantly younger than MER11A- and MER11B associated proviruses (p<0.005 and p<0.05, respectively). Approximately 30% of the HML8 elements were found co-localized within human genes, sometimes in exonic portions and with the same orientation, deserving further studies for their possible effects on gene expression. Overall, we provide the first detailed picture of the HML8 group distribution and variety among the genome, creating the backbone for the specific analysis of their transcriptional activity in healthy and diseased conditions

    HERV‐K(HML7) integrations in the human genome: Comprehensive characterization and comparative analysis in non‐human primates

    Get PDF
    Endogenous Retroviruses (ERVs) are ancient relics of infections that affected the primate germ line and constitute about 8% of our genome. Growing evidence indicates that ERVs had a major role in vertebrate evolution, being occasionally domesticated by the host physiology. In ad-dition, human ERV (HERV) expression is highly investigated for a possible pathological role, even if no clear associations have been reported yet. In fact, on the one side, the study of HERV expression in high‐throughput data is a powerful and promising tool to assess their actual dysregulation in diseased conditions; but, on the other side, the poor knowledge about the various HERV group genomic diversity and individual members somehow prevented the association between specific HERV loci and a given molecular mechanism of pathogenesis. The present study is focused on the HERV‐K(HML7) group that—differently from the other HERV‐K members—still remains poorly characterized. Starting from an initial identification performed with the software RetroTector, we collected 23 HML7 proviral insertions and about 160 HML7 solitary LTRs that were analyzed in terms of genomic distribution, revealing a significant enrichment in chromosome X and the frequent localization within human gene introns as well as in pericentromeric and centromeric regions. Phy-logenetic analyses showed that HML7 members form a monophyletic group, which based on age estimation and comparative localization in non‐human primates had its major diffusion between 20 and 30 million years ago. Structural characterization revealed that besides 3 complete HML7 pro-viruses, the other group members shared a highly defective structure that, however, still presents recognizable functional domains, making it worth further investigation in the human population to assess the presence of residual coding potential

    Multiple and diversified transposon lineages contribute to early and recent bivalve genome evolution

    Get PDF
    Background Transposable elements (TEs) can represent one of the major sources of genomic variation across eukaryotes, providing novel raw materials for species diversification and innovation. While considerable effort has been made to study their evolutionary dynamics across multiple animal clades, molluscs represent a substantially understudied phylum. Here, we take advantage of the recent increase in mollusc genomic resources and adopt an automated TE annotation pipeline combined with a phylogenetic tree-based classification, as well as extensive manual curation efforts, to characterize TE repertories across 27 bivalve genomes with a particular emphasis on DDE/D class II elements, long interspersed nuclear elements (LINEs), and their evolutionary dynamics.Results We found class I elements as highly dominant in bivalve genomes, with LINE elements, despite less represented in terms of copy number per genome, being the most common retroposon group covering up to 10% of their genome. We mined 86,488 reverse transcriptases (RVT) containing LINE coming from 12 clades distributed across all known superfamilies and 14,275 class II DDE/D-containing transposons coming from 16 distinct superfamilies. We uncovered a previously underestimated rich and diverse bivalve ancestral transposon complement that could be traced back to their most recent common ancestor that lived similar to 500 Mya. Moreover, we identified multiple instances of lineage-specific emergence and loss of different LINEs and DDE/D lineages with the interesting cases of CR1- Zenon, Proto2, RTE-X, and Academ elements that underwent a bivalve-specific amplification likely associated with their diversification. Finally, we found that this LINE diversity is maintained in extant species by an equally diverse set of long-living and potentially active elements, as suggested by their evolutionary history and transcription profiles in both male and female gonads.Conclusions We found that bivalves host an exceptional diversity of transposons compared to other molluscs. Their LINE complement could mainly follow a "stealth drivers" model of evolution where multiple and diversified families are able to survive and co-exist for a long period of time in the host genome, potentially shaping both recent and early phases of bivalve genome evolution and diversification. Overall, we provide not only the first comparative study of TE evolutionary dynamics in a large but understudied phylum such as Mollusca, but also a reference library for ORF-containing class II DDE/D and LINE elements, which represents an important genomic resource for their identification and characterization in novel genomes
    corecore